Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm surprised you don't mention insertion/deletion for why plain mmap (without chunk lists or trees) does not work... isn't that a deal breaker regardless, or am I missing something?

EDIT: Ah, did you just mean why it doesn't use mmap at all, e.g. backing the actual data in the tree?



Yep. A simple no-insertion editor could just mmap a file with in-place editing. But you could support insertions by weaving together the mmap'd buffer with other buffers. This is how the suckless vis editor works: https://lists.suckless.org/dev/1409/23497.html

The advantage is that your document can be a simple list of (ptr, length) pairs. The price you pay is error handling: mmap reports errors through raising SIGSEGV or SIGBUS, which are very painful to handle properly.


Yes, a document can just be a (ptr, length). I guess I don't see the benefit a b-tree adds here. I'd probably just use a red-black, AVL, even binary tree with nodes either pointing to blocks on the file (ptr, length) or nodes containing new data in memory (ptr, length, data) with the invariant that no two nodes can ever have overlapping (ptr, length).

My understand is that btrees are a multi-level index that reduces the number of blocks that need to be read from disk to manage an index. I don't see that being a use-case here and thought maybe I was missing something.

With that said, I wonder if the calculus changes if the file is so large that it won't fit in memory and thus, b-trees maybe somehow helpful in flushing data to disk. However, this then starts to look like a block allocator problem. Suppose as you run out of memory you start flushing blocks to disk. But the user is still editing the file and so some of these blocks that were flushed to disk are no longer valid. Youu'd hve to keep track of those "unallocated" blocks on disk to enable re-use thus improve space efficiency. As far as I understand, block allocators are not solved with btrees.


Ah, yes. That would indeed still be a chunk list, albeit one backed by mmap.

I can of course be totally wrong, but I think the OP was wondering why you couldn't simply mmap the entire file in place without managing any lists, trees, or any data structure except for a flat array representing the whole file.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: