Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> As developers, we need to be better at handling edge cases like out of disk space, out of memory, pegged bandwidth and pegged CPU

In what situation though? Let's consider disk space. This certainly does not apply to all developers or all programs. Making your program understand the fact that the system has no space left does not seem like something that would be very productive in the vast majority of cases. Like running out of memory, it is not something the program can recover from all by itself unless it knows it created temporary files somewhere that it could go and delete. If that scenario does in fact apply to your program, then it's not even an edge case: the program should be deleting temporary files if it doesn't need them anymore. If the P3 was created to add support for that exact function, then I agree that it should be acted upon. A P3 is fine as long as it's reached. If you don't reach your P3s ever, then there are different issues that need addressing. I'd even say for something littering users' disks it should be higher than a P3, but the point is it's a specific case where it makes sense to handle that error. In every other case, your best bet is a _generic_ exception handler for write operations that will catch any failure and inform the user (e.g. "[Errno 28] No space left on device"), but that's something that should already be a habit.

There are cases when you want to try to avoid running out of disk space because your program might know that it needs to consume a lot of it (e.g. installers) so it will be checked preemptively. Even then you probably do want to try to handle running out of disk space (e.g. in the unfortunate event that something else consumed the rest of your disk _after_ you preemptively calculated how much was required) so you can attempt a rollback and inform the user to try again.

Other than that, when else is that _specific_ error more important than knowing that the data just couldn't be written in general? Let's say you have a camera app that tries to save an image. Surely you'd have a generic exception handler for not being able to save the image, rather than a specific handler for "out of space", which seems oddly specific considering there are literally hundreds of specific errnos you could be encountering that would prohibit you from writing. I'm sure the user doesn't want to see something like "Looks like you're out of disk space. Do you want to try save this image in lower quality instead?"

So my point in all of this is I agree that we should _consider_ the impact of disk space but it doesn't need to be prioritized by developers unless it's actually important like in the first few examples I gave.



It's important that you can recover from this condition.

For example, I'm working on an NVR project. It has a SQLite database that should be placed on your SSD-based root filesystem and puts video frames on spinning disks. It's essentially a specialized DBMS. You should never touch its data except though its interface.

If you misconfigure it, it will fill the spinning disks and stall. No surprise there. The logical thing for the admin to do is stop it, go into the config tool, reduce the retention, and restart. (Eventually I'd like to be able to reconfigure a running system but for now this is fine.)

But...in an earlier version, this wouldn't work. It updates a small metadata file in each video dir on startup to help catch accidents like starting with an older version of the db than the dir or vice versa. It used to do this by writing a new metadata file and then renaming into place. This procedure would fail and you couldn't delete anything. Ugh.

I fixed it through a(nother) variation of preallocation. Now the metadata files are a fixed 512 bytes. I just overwrite them directly, assuming the filesystem/block/hardware layers offer atomic writes this size. I'm not sure this assumption is entirely true (you really can't find an authoritative list of filesystem guarantees, unfortunately), but it's more true then assuming disks never fill.

It might also not start if your root filesystem is full because it expects to be able to run SQLite transactions, which might grow the database or WAL. I'm not as concerned about this. The SQLite db is normally relatively small and you should have other options for freeing space on the root filesystem. Certainly you could keep a delete-me file around as the author does.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: