Apache Arrow is trying to do something similar, using Flatbuffer to serialize with zero-copy and zero-parse semantics, and an index structure built on top of that.
Arrow has a different use case I think. Lite3 / TRON is effectively more efficient JSON. Arrow uses an array per property. This allows zero copy per property access across TB scale datasets amongst other useful features - it’s more like the core of a database.
A closer comparison would be to FlatBuffers which is used by Arrow IPC, a major difference being TRON is schemaless.
My threshold for “does not need to be smaller” is “can this run on a Raspberry Pi”. This is a helpful benchmark for maximum likely useful optimization.
Curious about comparisons with Apache Arrow, which uses flatbuffers to avoid memory copying during deserialization, which is well supported by the Pandas ecosystem, and which allows users to serialize arrays as lists of numbers that have hardware support from a GPU (int8-64, float)
Apache Arrow is more of a memory format than a general‑purpose data serialization system. It’s great for in‑memory analytics and GPU‑friendly columnar storage.
Apache Fory, on the other hand, has its own wire‑stream format designed for sending data across processes or networks. Most of the code is focused on efficiently converting in‑memory objects into that stream format (and back) — with features like cross‑language support, circular reference handling, and schema evolution.
Fory also has a row format, which is a memory format, and can complement or compete with Arrow’s columnar format depending on the use case.
How are you a landlord if you're paying property taxes?
Once you have everything else set up, you can migrate to a server hosted on your own internet connection. Running your own data center is one of the more tricky parts of the equation, compared to almost-free web hosting for a 10MB site.
You’re also only renting your internet connection!
If you want to be a real rent-seeker (sorry, meant to say “landlord”) you’ll need to purchase an AS and become a BGP-peering sovereign citizen cutting deals with backbone networks.
> ... you’ll need to purchase an AS and become a BGP-peering sovereign citizen cutting deals with backbone networks
Which is doable as an individual. One of my very best mate did just that: granted he's got quite the networking skills but he did that entirely on his own.
He'll even get 256 IPv4 addresses but for these he was put on a long waiting list (I think in one to two months he'll get them but he's waiting since about a year): IPv4 addresses are the actual scarce landlordy Internet resources!
Your buddy probably still has to pay for internet exchange with the backbones, they only give it for free if you're also a tier 1 ISP (i.e. see Lumen's requirements[1] and the others'[2]). Even massive ISPs like Comcast still have to pay for internet access[3] because they're not big enough to be a "peer" to the tier 1s.
PS. I think you're shadowbanned since this and your last 7 comments all showed up as [dead].
And even with a decentralized mesh network you rely on good behavior from your peer/local nodes. Turns out the only way to truly own land is when your network consists of 10.0.0.0/8.
It’s not a racket — the state does use its monopoly on violence to enforce your title to the land. Otherwise it would only be yours until someone bigger and stronger came by.
And the mob really does honor their protection racket, too. If some punk comes and tries messing with a store protected by the mob, the mob deals with the problem.
Yet nobody goes around looking to purchase protection from the mob either, do they? The key problem with the arrangement isn't that the protection isn't provisioned, it's that the entire arrangement is involuntary and forced upon the business owner through threat of violence, whether by the mob or the state.
Interesting! Text files in git can work for small sizes, like your 100MB.
That is what's known in FAISS as a "flat" index, just one thing after another. And obviously you can query by primary key to the key-value store that is git, and do atomic updates as you'd expect. In SQL land this is an unindexed column, you can do primary key lookups on the table, or you can look through every row in order to find what you want.
If you don't need fast query times, this could work great! You could also use SQL (maybe an AWS Aurora Postgres/MySQL table?) and stuff the fact and its embedding into a table, and get declarative relational queries (find me the closest 10 statements users A-J have made to embedding [0.1, 0.2, -0.1, ...] within the past day). Lots of SQL databases are getting embedding search (Postgres, sqlite, and more) so that will allow your embedding search to happen in a few milliseconds instead of a few seconds.
It could be worth sketching out how to use SQLite for your application, instead of using files on disk: SQLite was designed to be a better alternative to opening a file (what happens if power goes out while you are writing a file? what happens if you want to update two people's records, and not get caught mid-update by another web app process?) and is very well supported by many language ecosystems.
Then, to take full advantage of vector embedding engines: what happens if my embedding is 1024 dimensions and each one is a 32 bit floating point value? Do I need to save all of that precision? Is 16-bit okay? 8-bit floats? What about reducing the dimensionality? Is it good enough accuracy and recall if I represent each dimension with an index to a palette of the best 256 floats for that dimension? What about representing each pair of dimensions with an index to a palette of the best 256 pairs of floats for those two dimensions? What about, instead of looking through every embedding one by one, we know that people talk about one of three different topics, and we have three different indices for each of those major topics, and to find your nearest neighbors you want to first find your closest topic (or maybe closest two topics?) and then search in those lower indices? Each of these hypotheticals is literally a different “index string” in an embedding search called FAISS, and could easily be thousands of lines of code if you did it yourself.
It’s definitely a good learning experience to implement your own embedding database atop git! Especially if you run it in production! 100MB is small enough that anything reasonable is going to be fast.
Could be an artifact of the small size not fully taking advantage of the GPU. For example, for the slightly larger Qwen3 0.6B model the A100 is faster (you can see it when scrolling to the bottom here: https://github.com/rasbt/LLMs-from-scratch/tree/main/ch05/11...)
From that table, the A100 tok/sec (larger is faster) numbers are:
- Eager: 28
- Compiled: 128
And
- KV cache eager: 26
- KV cache compiled: 99
The reason that the KV cache is slower is likely because it's not GPU-optimized code. On CPU the KV cache is faster. To make it faster on GPU, you would pre-allocate the tensors on the device for example instead of `torch.cat`ting them on the fly
Source about aviation: primary (I am at an airport now) and also there are no flights going into or out of JFK right now https://www.jfkairport.com/flight-tracker?view=VIEW_DEPARTUR...
reply