To make the web distributed-archive-friendly I think we need to start referencing things by hash and not by a path which some server has implied it will serve consistently but which actually shows you different data at different times for a million different reasons.
If different data always gets a different reference, it's easy to know if you have enough backups of it. If the same name gets you a pile of snapshots taken under different conditions, it's hard to be sure which of those are the thing that we'd want to back up for that particular name.
(this doc is 5-6 years old though, and I'm not sure what may have changed since then)
In my own (toy-scale) IPFS experiments a couple years ago it has been rather usable, but also the software has been utterly insane for operators and users, and if I were IA I would only consider it if I budgeted for a from-scratch rewrite (of the stuff in use). Nearly uncontrollable and unintrospectable and high resource use for no apparent reason.
IPFS has shown that the protocol is fundamentally broken at the level of growth they want to achieve and it is already extremely slow as it is. It often takes several minutes to locate a single file.
The beauty is that IA could offer their own distribution of IPFS that uses their own DHT for example, and they could allow only public read access to it. This would solve the slow part of finding a file, for IA specifically. Then the actual transfers tend to be pretty quick with IPFS.
What's the point of using IPFS then? Others can still spread the file elsewhere and verify it's the correct one, by using the exact same ID of the file, although on two different networks. The beauty of content-addressing I guess.
That isn’t solving the problem, it’s just giving them more of it to work on. IA has enough material that I’d be surprised if they didn’t hit IPFS’s design limits on their own, and they’d likely need to change the design in ways which would be hard to get upstream.
There was a startup called Space Monkey that sold NAS drives where you got a portion of the space and the rest was used for copies of other people’s content (encrypted). The idea was you could lose your device, plug in a new one and restore from the cloud. They ended up folding before any of their resilience claims could be tested (at least by me).
Would be people be willing to buy an IA box that hosted a shard of random content along with the things they wanted themselves?
Does anyone remember wua.la? It worked similar in that you offered local disk space in exchange for cloud storage. It was later bought by LaCie and killed off shortly after.
If different data always gets a different reference, it's easy to know if you have enough backups of it. If the same name gets you a pile of snapshots taken under different conditions, it's hard to be sure which of those are the thing that we'd want to back up for that particular name.