Seems like this only helps large (heavy) websites with consistent content. The real world examples are all large, like YouTube, Amazon, etc...
Small JSON responses that compress to <1k would fit in a single packet, so I don't see the advantage of going from "65 bytes with normal Zstandard compression, vs 28 bytes when using the past response as a dictionary - 57% smaller."
yes i was under the same impression, but i think this LUT/dictionary solution is counter intuitive to both of our current understandings of the web.
The "aha" moment for me was that, without this dict, the user is going to always request a full download of the data. For instance, let's say the NYT published an article and you read it. Then an editors note is added to the article. When you go back to read the article, the data transfer is miniscule. Now that is an edge case, but imagine a website that allows comments.. twitter.. reddit.. small text based pages that at first seem incosequential until you think about how we use the web, millions of users, returning to pages over and over again.
For me, my mental model of this structure is a LUT(key/value pair) wrapped in a Version Control(hash).
Now i think your comment is correct if we were to add how many requests the webpage is recieving and how frequently changes are happening to said webpage. My blog would recieve no benefits from implementing this tech, and using napkin math, my blog would need 1000 days to break even. Microsofts' blog however... less than a day, in theory.
If the version control hash changes you have to re-download the dictionary, which is similar to redownloading the whole page.
Reddit/NYT would have to publish their changes without changing the dictionary, meaning some portions would be largely absent from the dictionary and have worse compression than gzip. Probably fine for NYT, something like Reddit might actually have worse ratios than gzip in that case.
Maybe? That gets sort of awkward for frequently updated things like Reddit where there might be 10 dictionary versions between what you have and the current version. You’d need something that decides whether to get an incremental update or a new dictionary, and the hoster has to store those old dictionaries. Feels like more trouble than it’s worth.
You could compress things with gzip if the dictionary doesn’t work well, but to my understanding gzip compresses repetition. There’s less repetition in smaller chunks, so worse compression ratios. Eg compressing each comment individually has a worse net ratio than compressing all the comments at once.
It would also be annoying to merge a bunch of individually compressed blocks back together, but certainly an option
I’m pretty sure the dictionary just gets put on the front of the compression algorithm’s “context” so that it can be referenced just like any other part of the document. You wouldn’t need individual blocks with different compression schemes, it would all get compressed together.
The toy examples aren't the savings. Do the calculations with a json list of 100 objects and you'll find that compress increase more significantly.
So yeah, one time object return isn't impressive. Once those objects are in an array, then there's a much more remarkable compression.
While reading, I started wondering if we'll see an LLM constructor that'll take a API and some actual browser use and create a model that maximizes these types of message-centric compression.
Private mode hides your history. It does not protect you from malicious extensions, phishing lookalikes, or social engineering that happens in the same browser where you do everything else.
Haven creates a dedicated environment that verifies the real site and blocks the tricks that normally slip through. It is a space where your money tasks stay isolated and safe.
On the technical side, we built Haven as an Electron app, which keeps activity separate from the browser’s normal attack surface. The underlying attack vectors are different, so malware has a much harder time reaching this environment.
> - Find a good dentist within 2mi from my house, call them to make sure they take my insurance, and book an appointment sometime in the next two weeks no earlier than 11am
The web caused dentists to make websites, but they don't post their appointment calendar; they don't have to.
Will AI looking for appointments cause businesses to post live, structured data (like calendars)? The complexity of scheduling and multiple calendars is perfect for an AI solution. What other AI uses and interactive systems will come soon?
- Accounting: generate balance sheets, audit in real-time, and have human accountants double check it (rather than doing)
- Correspondence: create and send notifications of all sorts, and consume them
- Purchase selection: shifting the lack of knowledge about products in the customers favor
The problem is that we're reverting back to the stone age by throwing unnecessary resources at problems that have a simple and effective solution: open, standardised, and accessible APIs.
We wouldn't need to use an expensive (compute-wise) AI agent to do things like making appointments. Especially if in the end you'd end up with bots talking to bots anyway. The digital equivalent of always up-to-date yellow pages would solve many of these issues. Super simple and "dumb" but reliable programs could perform such tasks.
Scheduling multiple calendars doesn't require "AI" - it's a comparatively simple optimisation problem that can be solved using computationally cheap existing algorithms. It seems more and more to me that AI - and LLMs in particular - are the hammer and now literally everything looks like a nail...
> Is there a model could we create to bolster out the middle?
Extend Copyright? (no, no..)
I have two ideas:
- Recommendations. Publishers connect with private LLM/Agents for custom recommendations. They'd need to keep reviews private, but could trade them among themselves.
- Insurance Pool. Authors could add works to a pool of books, and the profits could be split. Publishers would need to maintain the quality of books or authors won't join.
You mean like an authorship coop? Might work, but the main problem is authors' self importance. Not a single author I know of would opt for it. They are all just impoverished millionares.
For less narcissistic authors it may well work, though. Will pitch it, thanks for the idea!
Small JSON responses that compress to <1k would fit in a single packet, so I don't see the advantage of going from "65 bytes with normal Zstandard compression, vs 28 bytes when using the past response as a dictionary - 57% smaller."
reply