I have found it irritating how in the community, in recent years, it's popular t...

cesarb · on March 30, 2024

> The math of the lzma algorithm doesn't change. The library was "done" and that's ok.

Playing devil's advocate: the math doesn't change, but the environment around it does. Just off the top of my head, we have: the 32-bit to 64-bit transition, the removal of pre-C89 support (https://fedoraproject.org/wiki/Changes/PortingToModernC) which requires an autotools update, the periodic tightening of undefined behaviors, new architectures like RISC-V, the increasing amount of cores and a slowdown in the increase of per-core speed, the periodic release of new and exciting vector instructions, and exotic security features like CHERI which require more care with things like pointer provenance.

asveikau · on March 30, 2024

> the 32-bit to 64-bit transition

Lzma is from 2010. Amd64 became mainstream in the mid 2000s.

> removal of pre-C89 support

Ibid. Also, at the library API level, c89 compatible code is still pretty familiar to c99 and later.

> new architectures like RISC-V

Shouldn't matter for portable C code?

> the increasing amount of cores and a slowdown in the increase of per-core speed,

Iirc parallelism was already a focus of this library in the 2010s, I don't think it really needs a lot of work in that area.

snnn · on March 30, 2024

Actually, the new architectures are a big source of concerns. As a maintainer of a large open source project, I often received pull requests for CPU architectures that I never had a chance to touch. Therefore I cannot build the code, cannot run the tests, and do not understand most of the code. C/C++ themselves are portable, but libs like xz needs to beat the other competitors on performance, which means you may need to use model specific SIMD instructions, query CPU cache size and topology, work at very low level. These code are not portable. When people add these code, they often need to add some tests, or disable some existing tests conditionally, or tweak the build scripts. So they are all risks.

No matter how smart you are, you cannot forecast the future. Now many CPUs have a heterogeneous configuration, which means they have big cores and little cores. But do all the cores have the same capabilities? Is possible that a CPU instruction only available on some of the CPU cores? What does it mean for a multithreaded application? Would it be possible that 64-bit CPUs may drop the support for 32-bit at hardware level? Tens years ago you cannot predict what's going to happen today.

Windows has a large compatibility layer, which allows you running old code on the latest hardware and latest Windows. It needs quite a lot efforts. Many applications would crash without the compatibility patches.

asveikau · on March 30, 2024

I am a former MS employee, I used to read the compatibility patches when I was bored at the office.

Anyway, liblzma does not "need" to outperform any "competition". If someone wants to work on some performance optimization, it's completely fair to fork. Look at how many performance oriented forks there are of libjpeg. The vanilla libjpeg still works.

Hakkin · on March 30, 2024

and then that fork becomes more performant or feature rich or secure or (etc), and it becomes preferred over the original code base, and all distributions switch to it, and we're back at square one.

snnn · on March 31, 2024

The vanilla python works fine but conda is definitely more popular among data scientists.

rdtsc · on March 30, 2024

Excellent point. I believe that's coming from corporate supply chain attack "response" and their insistence on making hard rules about "currency" and "activity" and "is maintained" pushes this kind of crap.

Attackers know this as well. It doesn't take much to hang around various mailing lists and look for stuff like this: https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.h...

> (Random user or sock puppet) Is XZ for Java still maintained?

> (Lasse) I haven't lost interest but my ability to care has been fairly limited mostly due to ...

> (Lasse) Recently I've worked off-list a bit with Jia Tan on XZ Utils and perhaps he will have a bigger role in the future, we'll see. It's also good to keep in mind that this is an unpaid hobby project

With a few years worth of work by a team of 2-3 people: one writes and understand the code, one communicates, a few others pretend to be random users submitting ifunc patches, etc., you can end up controlling the project and signing releases.

Fnoord · on March 31, 2024

> 7-Zip supports .xz and keeping its developer Igor Pavlov informed about format changes (including new filters) is important too.

I've always found that dev's name to tilt me.

Funnily enough, the Chinese name was no reason to investigate. It was a performance issue.

Also, to the discussion that a distribution was targeted. Jia advocated Fedora to upgrade to 5.6.x. Fedora is the precursor for RHEL.

Together with the backdoor not working when LANG not set (USA).

Those are two details suggesting the target was USA. Though either or both could've been part of the deception.

snnn · on March 30, 2024

I mostly agree with you, but I think your argument is wrong. Last month I found a tiny bug in Unix's fgrep program(the bug has no risk). The program implements Aho Corasick algorithm, which hasn't changed much over decades. However, at least when the code was released to 4.4BSD, the bug still existed. It is not much a concern as nowadays most fgrep progroms are just an alias of grep. They do not use the old Unix code anymore. The old Unix code, and much part of FreeBSD, really couldn't meet today's security standard.For example, many text processing programs are vulnerable to DoS attacks when processing well-crafted input strings. I agree with you that in many cases we really don't need to touch the old code. However, it is not just because the algorithm didn't change.

bulatb · on March 30, 2024

A software project has the features it implements, the capabilities it offers users, and the boundary between itself and the environment in which those features create value for the user by becoming capabilities.

The "accounting" features in the source code may be finished and bug-free, but if the outside world has changed and now the user can't install the software, or it won't run on their system, or it's not compatible with other current software, then the software system doesn't grant the capability "accounting," even though the features are "finished."

Nothing with a boundary is ever finished. Boundaries just keep the outside world from coming in too fast to handle. If you don't maintain them then eventually the system will be overwhelmed and fail, a little at a time, or all at once.

asveikau · on March 30, 2024

I feel like this narrative is especially untrue for things like lzma where the only dependencies are memory and CPU, and written in a stable language like C. I've had similar experiences porting code for things like image formats, audio codecs, etc. where the interface is basically "decode this buffer into another buffer using math". In most cases you can plop that kind of library right in without any maintenance at all, it might be decades old, and it works. The type of maintenance I would expect for that would be around security holes. Once I patched an old library like that to handle the fact that the register keyword was deprecated.

pas · on March 30, 2024

C is not stable, CPU microarchitecture versions are coming from time to time. LZMA compression is not far from trivial. the trade-offs made back then might not be the most useful ones now, hence there are usually things that make sense to change even if the background math will be the same forever.

sure, churn and make believe maintenance for the sake of feeling good is harmful. (and that's where the larger community comes in, distributions, power users, etc. we need to help good maintainers, and push back against bad ones. and yes this is - of course - easier said than done.)

bulatb · on March 31, 2024

Smaller boundaries are likelier to need less maintenance, but nothing stands still. The reason you can run an ancient simple binary on newer systems is that someone has deliberately made that possible. People worked to make sure the environment around its boundary would stay the same instead of drifting randomly away with time—usually so doggedly (and thanklessly) that we can argue whether that stability was really a result of maintenance or just a fact of nature.

asveikau · on March 31, 2024

> The reason you can run an ancient simple binary on newer systems is that someone has deliberately made that possible.

I'm not talking about binaries. I'm talking about C sources. I've done the kind of work you're talking about. You're overestimating it.

bulatb · on March 31, 2024

I must have misread "plop that kind of library" as "plop that kind of binary" about five times. My bad.

empath-nirvana · on March 30, 2024

2 popular and well tested rust yaml libraries have recently been marked as unmaintained and people are moving away from them to brand new projects in a rush because warnings went out about it.

arp242 · on March 31, 2024

> There was nothing wrong with "unmaintained" lzma two years ago.

Well, that's not exactly true. The first patch from Jia Tan is a minor documentation fix, and the second is a bugfix which, according to the commit message (by Collin), "breaks the decoder badly". There's a few more patches after that that fix real issues.

Mark Adler's zlib has been around for a lot longer than xz/liblzma, and there's still bugfixes to that, too.

_zoltan_ · on March 31, 2024

I was just looking at headscale. Last release mid 2023.

I had immediately asked myself: is this even maintained anymore?

I think this is a very valid question to ask.