Improving BWT is great! In my view, improving "long range" compression has the b...

Improving BWT is great!

In my view, improving "long range" compression has the biggest potential. There are many, many algorithms and implementations for very short range (huffman, arithmetic, ANS) and short range (LZ, BWT), but not that much research has gone into "long range" yet. There's deduplication, and large-window LZ / BWT.. but not much more yet. What is missing is efficietly (and with little memory) finding similarities on multi-GB data sets. I think sorting by similarity would help there. Or did I miss research in this area?