More

lsb · 2026-01-03T09:57:53 1767434273

The New York Times has said that the US president has reported capturing the president of Venezuela https://www.nytimes.com/live/2026/01/03/world/trump-united-s...

Source about aviation: primary (I am at an airport now) and also there are no flights going into or out of JFK right now https://www.jfkairport.com/flight-tracker?view=VIEW_DEPARTUR...

lsb · 2025-12-19T05:54:42 1766123682

This is super interesting!

Apache Arrow is trying to do something similar, using Flatbuffer to serialize with zero-copy and zero-parse semantics, and an index structure built on top of that.

Would love to see comparisons with Arrow

willtemperley · 2025-12-20T03:50:17 1766202617

Arrow has a different use case I think. Lite3 / TRON is effectively more efficient JSON. Arrow uses an array per property. This allows zero copy per property access across TB scale datasets amongst other useful features - it’s more like the core of a database.

A closer comparison would be to FlatBuffers which is used by Arrow IPC, a major difference being TRON is schemaless.

lsb · 2025-12-06T00:31:25 1764981085

My threshold for “does not need to be smaller” is “can this run on a Raspberry Pi”. This is a helpful benchmark for maximum likely useful optimization.

A Pi has 4 cores and 16GB of memory these days, so, running Qwen3 4B on a pi is pretty comfortable: https://leebutterman.com/2025/11/01/prompt-optimization-on-a...

lsb · 2025-11-18T22:08:00 1763503680

Happy to answer any questions you have :)

lsb · 2025-10-28T20:19:58 1761682798

Curious about comparisons with Apache Arrow, which uses flatbuffers to avoid memory copying during deserialization, which is well supported by the Pandas ecosystem, and which allows users to serialize arrays as lists of numbers that have hardware support from a GPU (int8-64, float)

chaokunyang · 2025-10-29T05:39:21 1761716361

Apache Arrow is more of a memory format than a general‑purpose data serialization system. It’s great for in‑memory analytics and GPU‑friendly columnar storage.

Apache Fory, on the other hand, has its own wire‑stream format designed for sending data across processes or networks. Most of the code is focused on efficiently converting in‑memory objects into that stream format (and back) — with features like cross‑language support, circular reference handling, and schema evolution.

Fory also has a row format, which is a memory format, and can complement or compete with Arrow’s columnar format depending on the use case.

lsb · 2025-10-03T00:45:10 1759452310

fast.ai (some of the authors of this) was transformative for me, and the community was super nice. Cannot recommend looking into this highly enough.

lsb · 2025-09-13T14:33:28 1757774008

This is halfbakery! I love it!

(For example, a recent half baked idea there is a perpetually burning flag. https://www.halfbakery.com/idea/Perpetually_20Burning_20Flag... )

lsb · 2025-08-31T03:16:26 1756610186

How are you a landlord if you're paying property taxes?

Once you have everything else set up, you can migrate to a server hosted on your own internet connection. Running your own data center is one of the more tricky parts of the equation, compared to almost-free web hosting for a 10MB site.

You're also just renting a domain name.

Uehreka · 2025-08-31T03:34:48 1756611288

You’re also only renting your internet connection!

If you want to be a real rent-seeker (sorry, meant to say “landlord”) you’ll need to purchase an AS and become a BGP-peering sovereign citizen cutting deals with backbone networks.

TacticalCoder · 2025-08-31T11:12:04 1756638724

> ... you’ll need to purchase an AS and become a BGP-peering sovereign citizen cutting deals with backbone networks

Which is doable as an individual. One of my very best mate did just that: granted he's got quite the networking skills but he did that entirely on his own.

He'll even get 256 IPv4 addresses but for these he was put on a long waiting list (I think in one to two months he'll get them but he's waiting since about a year): IPv4 addresses are the actual scarce landlordy Internet resources!

201984 · 2025-08-31T12:35:59 1756643759

Your buddy probably still has to pay for internet exchange with the backbones, they only give it for free if you're also a tier 1 ISP (i.e. see Lumen's requirements[1] and the others'[2]). Even massive ISPs like Comcast still have to pay for internet access[3] because they're not big enough to be a "peer" to the tier 1s.

PS. I think you're shadowbanned since this and your last 7 comments all showed up as [dead].

[1]: https://www.lumen.com/en-us/about/legal/peering-policy.html

[2]: https://en.wikipedia.org/wiki/Tier_1_network#List_of_Tier_1_...

[3]: https://en.wikipedia.org/wiki/Tier_1_network#Other_major_net...

animuchan · 2025-08-31T08:58:02 1756630682

Getting your own backbone cable installed in the ocean is where the real expenses begin though.

ai-christianson · 2025-08-31T03:27:45 1756610865

I guess it's renting all the way down unless it's something like a decentralized network where control of keys represent ownership.

idle_zealot · 2025-08-31T06:56:22 1756623382

And even with a decentralized mesh network you rely on good behavior from your peer/local nodes. Turns out the only way to truly own land is when your network consists of 10.0.0.0/8.

anonym29 · 2025-08-31T03:18:18 1756610298

The government graciously allows you to sublet their property as long as you keep up with the annual protection racket payments

JackFr · 2025-08-31T12:54:18 1756644858

It’s not a racket — the state does use its monopoly on violence to enforce your title to the land. Otherwise it would only be yours until someone bigger and stronger came by.

anonym29 · 2025-08-31T15:25:24 1756653924

And the mob really does honor their protection racket, too. If some punk comes and tries messing with a store protected by the mob, the mob deals with the problem.

Yet nobody goes around looking to purchase protection from the mob either, do they? The key problem with the arrangement isn't that the protection isn't provisioned, it's that the entire arrangement is involuntary and forced upon the business owner through threat of violence, whether by the mob or the state.

conorcleary · 2025-08-31T12:49:21 1756644561

The violence is inherent in the system.

therein · 2025-08-31T03:34:58 1756611298

> How are you a landlord if you're paying property taxes?

Asking the important questions.

xboxnolifes · 2025-08-31T09:06:44 1756631204

Easy question. You're only a Lord, not a King, so you pay fealty.

lsb · 2025-08-21T09:10:08 1755767408

Interesting! Text files in git can work for small sizes, like your 100MB.

That is what's known in FAISS as a "flat" index, just one thing after another. And obviously you can query by primary key to the key-value store that is git, and do atomic updates as you'd expect. In SQL land this is an unindexed column, you can do primary key lookups on the table, or you can look through every row in order to find what you want.

If you don't need fast query times, this could work great! You could also use SQL (maybe an AWS Aurora Postgres/MySQL table?) and stuff the fact and its embedding into a table, and get declarative relational queries (find me the closest 10 statements users A-J have made to embedding [0.1, 0.2, -0.1, ...] within the past day). Lots of SQL databases are getting embedding search (Postgres, sqlite, and more) so that will allow your embedding search to happen in a few milliseconds instead of a few seconds.

It could be worth sketching out how to use SQLite for your application, instead of using files on disk: SQLite was designed to be a better alternative to opening a file (what happens if power goes out while you are writing a file? what happens if you want to update two people's records, and not get caught mid-update by another web app process?) and is very well supported by many language ecosystems.

Then, to take full advantage of vector embedding engines: what happens if my embedding is 1024 dimensions and each one is a 32 bit floating point value? Do I need to save all of that precision? Is 16-bit okay? 8-bit floats? What about reducing the dimensionality? Is it good enough accuracy and recall if I represent each dimension with an index to a palette of the best 256 floats for that dimension? What about representing each pair of dimensions with an index to a palette of the best 256 pairs of floats for those two dimensions? What about, instead of looking through every embedding one by one, we know that people talk about one of three different topics, and we have three different indices for each of those major topics, and to find your nearest neighbors you want to first find your closest topic (or maybe closest two topics?) and then search in those lower indices? Each of these hypotheticals is literally a different “index string” in an embedding search called FAISS, and could easily be thousands of lines of code if you did it yourself.

It’s definitely a good learning experience to implement your own embedding database atop git! Especially if you run it in production! 100MB is small enough that anything reasonable is going to be fast.

lsb · 2025-08-20T16:17:24 1755706644

That’s wild that with a KV cache and compilation on the Mac CPU you are faster than on an A100 GPU.

ModelForge · 2025-08-20T20:44:55 1755722695

Could be an artifact of the small size not fully taking advantage of the GPU. For example, for the slightly larger Qwen3 0.6B model the A100 is faster (you can see it when scrolling to the bottom here: https://github.com/rasbt/LLMs-from-scratch/tree/main/ch05/11...)

ladberg · 2025-08-20T19:47:53 1755719273

Given that the compiled version is slower than then eager version on A100, there's definitely something suboptimal happening there

ModelForge · 2025-08-20T20:48:43 1755722923

No the compiled version is actually faster.

From that table, the A100 tok/sec (larger is faster) numbers are:

- Eager: 28

- Compiled: 128

And

- KV cache eager: 26

- KV cache compiled: 99

The reason that the KV cache is slower is likely because it's not GPU-optimized code. On CPU the KV cache is faster. To make it faster on GPU, you would pre-allocate the tensors on the device for example instead of `torch.cat`ting them on the fly

ladberg · 2025-08-20T22:54:42 1755730482

Ah yep read the labels backwards and meant that - ty for catching and for the explanation

punnerud · 2025-08-20T16:54:44 1755708884

Because on Mac the CPU and GPU share memory, but A100 need to transfer to RAM/CPU on the parts that’s not supported by GPU?

(My first guess)

Weryj · 2025-08-20T16:31:38 1755707498

This would be because the GPU can’t fill its waveform and hide memory latency, no? I’m curious for a reason why