A 4x improvement over lzma is an extraordinary claim. I see the author has also given a result after applying lrzip (which removes long-range redundancies in large files), and the difference isn’t so great (but bzip3 still wins). Does the amazing result without lrzip mean bzip3 is somehow managing to exploit some of that long-range redundancy natively?
I’d be astonished if such a 4x result generalized to tarballs that aren’t mostly duplicated files.
Currently running my own benchmarks, my preliminary results are that zstd becomes competitive again once you add the --long option (so `zstd --long -16 all.tar` instead of `zstd -16 all.tar`). Which is an option that not everyone might be aware of, but whose usefulness should be intuitive for this benchmark of >200 very similar files.
I’d be astonished if such a 4x result generalized to tarballs that aren’t mostly duplicated files.