Hacker Newsnew | past | comments | ask | show | jobs | submit | taylorfinley's commentslogin

Surely they are testing their optimizations against common benchmarks internally? I bet the "real world task" degradation is larger by some multiple than it appears when measured through a benchmark that is part of the target.

I've noticed this and thought about it as well, I have a few suspicions:

Theory 1: Some increasingly-large split of inference compute is moving over to serving the new model for internal users (or partners that are trialing the next models). This results in less compute but the same increasing demand for the previous model. Providers may respond by using quantizations or distillations, compressing k/v store, tweaking parameters, and/or changing system prompts to try to use fewer tokens.

Theory 2: Internal evals are obviously done using full strength models with internally-optimized system prompts. When models are shipped into production the system prompt will inherently need changes. Each time a problematic issue rises to the attention of the team, there is a solid chance it results in a new sentence or two added to the system prompt. These grow over time as bad shit happens with the model in the real world. But it doesn't even need to be a harmful case or bad bugged behavior of the model, even newer models with enhanced capabilities (e.g. mythos) may get protected against in prompts used in agent harnesses (CC) or as system prompts, resulting in a more and more complex system prompt. This has something like "cognitive burden" for the model, which diverges further and further from the eval.


I can see a market for virtual copies of incredibly unpopular CEOs, but I don't think Mark would like how people would likely choose to use these digital effigies.

I've actually switched back to the web chat UI and copying Python files for much of my work because CC has been so nerfed.

I've seen this frequently also

I suspect it happens when the model's adaptive thinking was too conservative and it could have thought more, but didn't.

Right? Just add this to .bashrc:

alias yt-pl='yt-dlp -o "%(channel)s/%(playlist_title)s/%(title)s.%(ext)s" -a playlists.txt'


For both of these scenarios, it seems to happen when the context limit is getting full and the context is summarized. I've found it usually works to respond with the right file, i.e. "great, let's apply those changes in @path/to/file", but it may also be a good time to return to an earlier conversation point by editing one of your previous messages. You might edit the message that got you the response with changes not linked to a specific file, including the file path in that prompt will usually get you back on track.


probably skips the step where you say "take a look at path/to/file" and the model converts that to a tool call


Are you trying to be cute? These were clearly stolen, a century passing doesn't make it any less a shameful crime.


> Are you trying to be cute?

Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

https://news.ycombinator.com/newsguidelines.html


From the article...

"William Peppé handed the gems, relics and reliquaries to the colonial Indian government: the bone relics went to the Buddhist King of Siam (Rama V). Five relic urns, a stone chest and most other relics were sent to the Indian Museum in Kolkata - then the Imperial Museum of Calcutta.

Only a small "portion of duplicates", which he was allowed to keep, remained in the Peppé family, he notes. (Sotheby's notes say Peppé was allowed to keep approximately one-fifth of the discovery.)

Sources told the BBC the auction house considers the "duplicates" to be original items considered surplus to those donated, which the "Indian government permitted Peppé to retain". "

If this is true, it doesn't sound "clearly stolen" to me. Frankly, if there was serious reason to doubt that story, I would have expected the article to quote somebody willing to say so, rather than just expressing vague unease and general hand-wringing about the optics of the situation.


I can't help but feel like this is a satirical send up of "tech bros solve farming," except it's not satire.

I am a software engineer, I also runs a small family farm. I have 3d printers and laser cutters and lots of aluminum extrusion and raspberry pis... but I keep those things indoors, away from the dirt, sun, and rain. I can't imagine a real farmer using a contraption like this. Tools have to be reliable to last. I have to replace my solid steel shovels every few years because they wear out, how is this supposed to work?


I'm both as well. Imagine all that maintenance of keeping a hobby electronics project outside, all just to remove maybe 5% of the effort of growing vegetables. You can't even grow anything tall with it.

If they have a solid planning software that accounts for crop rotation, companion planting, etc. then that's already a much better value proposition.


Whats your point, that its not durable enough? Based on what? It doesn’t have a shovel attachment either as far as i can tell


Lol dirt is going to get in all of it and it's very hard to clean extruded aluminum rails, not to mention how small those wheels were. How's it driven - belts / gears? How often are you going to disassemble and maintain this thing? what's the maintenance schedule like? I would bet it is more demanding than planting a 4x8 raised bed.

That said I still love the project. I don't think the point is to grow plants maximally efficiently at this point, it's a early release of something cool and it's open source.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: