No, he didn't? He predicted that third parties would donate tokens to FOSS projects, not that the labs would. One is PR that started ages ago, the other is a reasonable prediction of where the world is going.
Not quite donate tokens directly (technically and practically weird), but donation -> compute has been out for a couple months on opub.dev (disclaimer, built it). So his prediction was somewhat correct if not late!
If this wasn't CERN tech I would think I was being taken for a ride. Conventional wisdom is that distributed consensus is not possible at this kind of performance, does anyone have a sense for how this is different and how my mental model is wrong?
> Conventional wisdom is that distributed consensus is not possible at this kind of performance
I'm not sure why you would think that? If you can assume the fiber is the same in both directions you know the round trip time is exactly double the latency of the connection. Then you know to phase shift your start time by that much when you get a start signal and you're in sync.
Obviously it's not trivial in practice, but it's not a fundamentally insurmountable problem.
Thanks. I thought it was interesting choosing arithmetic instead of some other relation because multimodal arithmetic (via CLP) is more of a PhD thesis than a blog post. Other relations might've been easier to demonstrate a general query.
What I couldn't tell from the article was if the author somehow achieved a multimodal arithmetic relation without needing CLP using a stack machine. That would be a neat technique.
Man that is such a bummer. The Naval Support Activity (NSA) "base" is not a hardened military facility. I've never been to the one in Bahrain, but it's usually where you go to play ultimate frisbee, maybe some paintball if you are lucky, and other types of R&R. Usually have a Naval Exchange (NEX) which is like a really discounted 7-11 / gift shop / walmart (depending on where you are).
Schools getting blown up is also a bummer. Everything about this situation and maybe the world is a bummer.
As soon as we stop treating these as bummers, there is literally nothing stopping a cycle of destruction. There may not be anyways, I don't know but giving up on empathy entirely seems even more dangerous than being bad at it.
I have plenty of the sympathy for the victims but none FIR the aggressors in this illegal war.
You seem to be suggesting that not feeling sorry for the soldiers who got to evacuate without all their belongings somehow means I'm losing my humanity. That's a dangerous thing - lives of the innocent civilians who didn't chose to be bombed are more important. Aggressors could simply.... Leave and stop being in danger.
Similarly I have little pity for Russian soldiers losing lives in another illegal war of aggression, knowing how many war crimes they committed in their wake.
Great writeup. Only thing I din't see in here was an analysis of the impact of players like Talaas[1] and their stupid faster hardware LLMs.
I feel like it could be majorly disruptive, but idk if it's going to prolong the apocalypse or bring it about sooner -- or if it's a big nothing burger.
I'm bullist for something like talaas to get smaller and easy to put in a desktop. Imagine an RPG where NPCs.... are way more complex and the entire game is very non deterministic.
I think I would like that as well. The problem is that if we bake an LLM into HW and make it cheaper and very efficient to run, then all games will have the same AI slop content, which could get boring pretty fast. The alternative is that these cards should load a different / fine-tuned LLM per game, but then we already have GPUs for that and today's LLMs are nowhere near good enough at the size which a GPU can run.
> Pre-training allows organizations to build domain-aware models by learning from large internal datasets.
> Post-training methods allow teams to refine model behavior for specific tasks and environments.
How do you suppose this works? They say "pretraining" but I'm certain that the amount of clean data available in proper dataset format is not nearly enough to make a "foundation model". Do you suppose what they are calling "pretraining" is actually SFT and then "post-training" is ... more SFT?
There's no way they mean "start from scratch". Maybe they do something like generate a heckin bunch of synthetic data seeded from company data using one of their SOA models -- which is basically equivalent to low resolution distillation, I would imagine. Hmm.
Pre-training mean exposing an already-trained model to more raw text like PDF extracts etc (aka continued pre-training). You wouldn't be starting from scratch, but it's still pre-training because the objective is just next token prediction of the text you expose it to.
Post-training means everything else: SFT, DPO, RL, etc. Anything that involves things like prompt/response pairs, reward models, or benefits from human feedback of any kind.
Er, then what is the "already trained" model? I thought pre-training was the gradient descent through the internet part of building foundational models.
Yeah, this checks out. I wonder what they are doing to prevent semantic collapse. Also, I wonder if the base model would already be instruct and RLHF tuned or only pre-trained. Trying to do additional training without semantic collapse in a way that is meaningful would be interesting to understand. Presumably they are using adapters but I've never had much luck in stacking adapters.
i.e.:
1. Do I start with an RLHF tuned model, "pretrain" on top of that (with adapter or by freezing weights?), then SFT on top of that (stack another adapter, or add layer(s) and freeze weights?) (and where did I get the dataset? synthetic extraction from corpus?), then RL (adapter, add layer(s) and freeze?)
I can imagine that, as usual, you start with a few examples and then instruct an LLM to synthesize more examples out of that, and train using that. Sounds horrible, but actually works fairly well in practice.
> Mr Cannon-Brookes told investors he “couldn’t be more bullish” about the opportunities ahead, despite relentlessly selling his own shares in the company daily. The Nightly reports he kept selling 7665 shares on a daily basis even in the month prior to the results at prices ranging from $US161.11 (AU$227) a share on January 8 to $US105.14 on February 4.
> While ordinary Aussies are asked to make big changes, the 46-year-old decided to treat himself to a ritzy new private jet late last year, admitting to a “deep internal conflict” over the carbon-heavy method of travel.
> The Atlassian co-founder and CEO bought a Bombardier 7500 and will use it to travel across his vast business operations, which include a minority stake in the Utah Jazz NBA team and a sponsorship deal with Formula 1.
There's a great 1986 book "Designing and Programming Personal Expert Systems" by Feucht and Townsend that implements expert systems in Forth (and in the process, much of the capability of Prolog and Lisp).
Ha,you beat me to it! That book was my first thought when I saw this post. I have a copy sitting here on my bookshelf.
Just to expand on how bonkers this book is... they assume that everyone has easy access to a Forth implementation. So they teach you how to build a Lisp on top of it. Then they use the Lisp you just built to build a Prolog. Then, finally, they do what the topic of the book actually is: build a simple expert system on top of that Prolog.
To be fair, in the 1980s thanks to the Forth Interest Group (FIG), free implementations of Forth existed for most platforms at a time when most programming languages were commercial products selling for $100 or more (in 1980s dollars). It's still pretty weird, but more understandable with that in mind.
Constantly amused by the split in comments of any moderately innovative language post between ‘I don't care about all this explanation, just show me the syntax!’ and ‘I don't understand any of this syntax, what a useless language!’
If the language is ‘JavaScript but with square brackets instead of braces’ maybe the syntax is relevant. But in general concrete syntax is the least interesting (not least important, but easiest to change) thing in a programming language, and its similarity to other languages a particular reader knows less interesting still. JavaScript is not the ultimate in programming language syntax (I hope!) so it's still worth experimenting, even if the results aren't immediately comprehensible without learning.
In Prolog the syntax is incredibly important. It is designed to be metainterpreted with the same ease in which a for-loop might be written in another language.
This can be arbitrarily extended in very interesting, beautiful, and powerful ways. This is extraordinarily hard to achieve and did not happen by accident.
As a challenge, see how easy it is to write a metainterpreter in another language of your choice. Alternately, see if you can think of any way the metainterpretation system in Prolog could be improved.
Finally, think of what would happen to this if we changed the syntax and introduced something like object.field notation.
So while logical programming can be achieved with other syntaxes, the metaintrepretive aspect will be lost. I have yet to see a language that does this better.
Nice link, thank you! I'm not sure it's super related to my comment but it is closely related to some other things I'm thinking about. I'll give it a read :)
[1] https://www.youtube.com/watch?v=m-bT5v5Tm7w&t=164s
reply