As someone who hasn't used usenet for downloading content, why is it that it seems that a few central indexing services are the way people access this medium? Wouldn't it be possible - and preferable - to distribute the index through p2p? That way, there would be no single point of failure.
The index would be insanely huge. A lot of these indexing sites are used for their API. Dedicated applications like SickBeard automatically compile lists of wanted files (movies, TV rips and music) and use them to search against the provider. The provider returns a file (NZB) that contains references to hundreds of encoded files hosted on newsgroups.
Here's a breakdown of one NZB, for a single 3.88GB file.
Inside are 74 RAR files, and 15 additional PAR files containing parity for recovering corrupted data.
Each of these 89 files is split into 37 parts of 1.4MB for ~3200 pieces total.
These pieces are then encoded with yENC (similar to base64) and then uploaded to the newsgroup.
Every single one of these yENC pieces is referenced in the NZB file, making for a total of 400KB. Just because of this, the NZB indexs are enormous; hundreds of gigabytes, probably terabytes in some cases.
There's no need for everybody to serve the entire index. Set up a DHT and let interested parties exchange the NZBs. All problems in computer science can be solved by another level of indirection.
DebTorrent was to be a set of BitTorrent extensions to allow the Debian archive to be syndicated as one ever-growing torrent. Seems similar to the use case here, and free from the problem we have now which is the huge indexes getting hit by takedown notices.
http://wiki.debian.org/DebTorrent
it's a reputational and data quality issue as well. completely decentralized content distribution doesn't usually work unless there is some kind of curation, there are just too many motivations for disruption. it's not an impossible problem, but long ago ceased being mostly about the technical occupation of shipping bits between nodes.
You could always have sites that only serve up indexes of NZB files (e.g. "NZB file => SHA-256"). Then you could use these to determine which NZB files to trust. I would assume that distributing a "NZB => Checksum" hash would be a lot smaller/easier than running an entire site on top of a terabyte of NZB files.
That's usually where the indexing sites get them from in the first place. The issue comes when you're a client that's not indexing continuously; downloading billions of headers to try and find a particular NZB is a mammoth task.
> downloading billions of headers to try and find a
> particular NZB is a mammoth task
This is true. Even a ~10 of years ago, downloading just the headers on some of the newsgroups shot my cache directory into the gigabytes (for a single newsgroup).