I agree, nothing beats Nexus right now. In fact, even the AA FAQ recommends using it for recent papers. They technically include the Nexus dataset in AA, but it hasn't been updated since 2024.
Yep, let's accept the monstrous industries which lock down culture for money.
I for one support their efforts. The same way we store seeds in vaults deep in the depths of the earth, we should do this for digital content too, and without retaliation from any specific industry.
What would be better is solve the root problem. These (illegal, somewhat legitimate) hoarding sites are most valuable for research literature which, given the public funding nature of these things should not be gated to begin with.
The comsequence of resolving the symptoms is that illegitimate use piggy back on it. Artistic literature that would legitimately deserve protection get hoarded as well.
Sweating authors of clearly copyrightable arts, typically novels, manuals, are seeing their work accessed free of royalties. For the sake of freely distributing scientific literature.
It makes it impossible to make then distinction given the legitimate utility is operating in a dark domain.
Yes, we should archive everything. And we should perhaps reform IP more broadly and re-think how we treat our culture. And we shouldn't expect retaliation.
But retaliation will happen, and I worry that it's going to pull down one of the most incredible archives along with it.
And recode(1) has full support for ISO-8859-*. As does iconv and the Python3 encodings.codecs module. I'm pretty sure browsers can render pages in them, too. Firefox keeps rendering UTF-8 pages as if they were ISO-8859-1 encoded when I screw up at setting the charset parameter on their content-type.
It seems incompatible with the idea that it's "Gone. Forever." Thinking again doesn't change that for me. The only thing that's gone is the exclusivity to a single proprietary-software vendor.
A simple case. Amigans can still use thanks to standards, Usenet and IRC, they can connect to Bitlbee.org to several choices. With Discord and such it's more difficult, but for Jabber there's no isue at all. Ditto with AmiSSL and Jabber, Gemini clients. They can reuse Amiga 4000 machines (or FPGA based ones) and browse small sites, Gopher, connect to Biltbee and make tons of services usable again.
With Nerdfonts, these will be obsolete in further Unicode releases.
GNU Unifont and the unicode table might be backported to the Amiga.
With NerdFonts, you need to do twice the jobs.
Having played around with this sort of thing in the llama.cpp ecosystem when they added it a few weeks ago, I will say that it also helps if your models a) are tuned to output json and b) you prompt them to do so. Anything you can do to help the output fit the grammar helps.