For personal projects, I'd rather just pay $2/month and not think about it than get hit with a random bill and scramble to migrate before the next month's bill. Bunny is perfect for this use case where you have a handful of projects that aren't all actively maintained. It just works without hand-holding, and since you're paying for the service, there's no rugpull looming.
The biggest bill I've gotten from Bunny was like $10 when my app (https://atlasof.space) briefly went viral and got 100k+ views in a month. Bunny CDN is so reasonably priced and the realistic visitor ceiling for my projects is low enough that it's still negligible. The free->paid cliff is typically a lot steeper than this in my experience.
> In order to keep your service online, you are required to keep a positive account credit balance. If your account balance drops low, our system will automatically send multiple warning emails. If despite that, you still fail to recharge your account, the system will automatically suspend your account and all your pull zones. Any data in your storage zones will also be deleted after a few days without a backup. Therefore, always make sure to keep your account in good standing.
You proactively replenish your balance, so in the worst case, you can just let the account go.
Barely any of them break 0% on any of the demo tasks, with Claude Opus 4.6 coming out on top with a few <3% scores, Gemini 3.1 Pro getting two nonzero scores, and the others (GPT-5.4 and Grok 4.20) getting all 0%
Pre-release, I would have expected Gemini 3.1 Pro to get ahead of Opus 4.6, with GPT-5.4 and Grok 4.20 trailing. Guess I shouldn't have bet against Anthropic.
Not like it's a big lead as of yet. I expect to see more action within the next few months, as people tune the harnesses and better models roll in.
This is far more of a "VLA" task than it is an "LLM" task at its core, but I guess ARC-AGI-3 is making an argument that human intelligence is VLA-shaped.
My broad vibe is that Gemini 3.1 Pro is the best at visual/spatial tasks and oneshotting while Opus 4.6 is the best at path planning. This task leans heavily on both but maybe a little more towards planning so I'm not too shocked that Opus in narrowly on top.
When running, the grids are represented in JSON, so the visual component is nullified but it still requires pretty heavy spatial understanding to parse a big old JSON array of cell values. Given Gemini's image understanding I do wonder if it would perform better with a harness that renders the grid visually.
Given the drastic difference in price, I think the chart definitely shows Gemini 3.1 in the best light. Google DeepMind is basically the same thing but they're willing to pay as much electricity as Anthropic is to achieve its benchmarks
The individual task scores are all on public tasks, they still held out a hundred or so private tasks that presumably GPT-5.4 did well on to get its leaderboard position.
The point is still to test frontier models at the limit of their capabilities, regardless of how it's branded. If we're still capable of doing so in 2057 I'll upvote the ARC-AGI-26 launch post!
At least according to the Head of Product at X, Sora was by far the most widely used tool to create fake war videos[0] aiming to push various false narratives. Given how popular fake content is at Meta I can only imagine what they see there (if they even have anybody looking at this kind of thing).
On X, viewing actual war footage was locked behind age-gating and identity verification, while any idiots' fake war footage was uncensored and consumable by anyone.
I understand that misinformation is a bad thing, and your point is taken that I was probably too quick to brush off the worst thing that Sora did as 'some funny memes'. But still. Photoshop is used to make a lot of misinformation, probably 1000x to 10,000x as much as Sora did, or even more than that. Does anyone say the latest version of Photoshop is like unveiling a weapon? Does anyone say that AI driven generative fill in Photoshop is like creating killing robots?
Easy to forget but there was a ton of industry+investor excitement around computer vision from ~2015-2021, to the extent that the "MLops" niche sprung up around it. This was called AI at the time, and mostly went out the window when general-pupose pretrained models arrived.
We’re reaching a saturation threshold where older models are good enough for many tasks, certainly at 100x faster inference speeds. Llama3.1 8B might be a little too old to be directly useful for e.g. coding but it certainly gets the gears turning about what you could do with one Opus orchestrator and a few of these blazing fast minions to spit out boilerplate…
reply