Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> offset the weight of all the unique entries in your dedup table

Didn't read the 7000 words... But isn't the dedup table in the form of a bunch of bloom filters so the whole dedup table can be stored with ~1 bit per block?

When you know there is likely a duplicate, you can create a table of blocks where there is a likely duplicate, and find all the duplicates in a single scan later.

That saves having massive amounts of accounting overhead storing any per-block metadata.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: