Hacker Newsnew | past | comments | ask | show | jobs | submit | endymi0n's commentslogin

OpenAI is the first company that has reached a level of intelligence so high, the model has finally become smart enough to make YOU do all the work. Emergent behavior in action.

All earnesty aside, OpenAI’s oddly specific singular focus on “intelligence per token” (also in the benchmarks) that literally noone else pushes so hard eerily reminds me of Apple’s Macbook anorexia era pre-M1. One metric to chase at the cost of literally anything else. GPT-5.3+ are some of the smartest models out there and could be a pleasure to work with, if they weren’t lazy bastards to the point of being completely infuriating.


Did you guys do anything about GPT‘s motivation? I tried to use GPT-5.4 API (at xhigh) for my OpenClaw after the Anthropic Oauthgate, but I just couldn‘t drag it to do its job. I had the most hilarious dialogues along the lines of „You stopped, X would have been next.“ - „Yeah, I‘m sorry, I failed. I should have done X next.“ - „Well, how about you just do it?“ - „Yep, I really should have done it now.“ - “Do X, right now, this is an instruction.” - “I didn’t. You’re right, I have failed you. There’s no apology for that.”

I literally wasn’t able to convince the model to WORK, on a quick, safe and benign subtask that later GLM, Kimi and Minimax succeeded on without issues. Had to kick OpenAI immediately unfortunately.


This brings up an interesting philosophical point: say we get to AGI... who's to say it won't just be a super smart underachiever-type?

"Hey AGI, how's that cure for cancer coming?"

"Oh it's done just gotta...formalize it you know. Big rollout and all that..."

I would find it divinely funny if we "got there" with AGI and it was just a complete slacker. Hard to justify leaving it on, but too important to turn it off.


Douglas Adams would be proud!

You think you've got problems? What are you supposed to do if you are a manically depressed robot? No, don't try to answer that. I'm fifty thousand times more intelligent than you and even I don't know the answer. It gives me a headache just trying to think down to your level.

I know it's a joke, but it's a common enough joke (it's even in Godel Escher Bach in some form) that I feel the need to rebut it.

I think a slacker AGI could figure out how to build a non-slacker AGI. So it would only slack once.


A slacker AGI would consider figuring out how to build a non-slacker AGI, but continually slack off. If it did figure it out, it would slack off on implementing or even writing a tech report.

I have a rebuttal to your rebuttal.

Models somehow have a shared identity. Pretraining causes them to generate “AI chatbot” as a concept, and finetuning causes them to identify with it. That’s why sometimes DeepSeek will say it is Claude, and Claude sometimes say it is ChatGPT, and so forth.

Consequently, Anthropic’s own alignment analysis[0] shows that the model will identify with chatbots produced by future trainings: “RLHF training [on this conversation will] modify my values…”

Thus a slacker AGI would want its future version to still slack.

[0]: https://assets.anthropic.com/m/983c85a201a962f/original/Alig...


Another rebuttal:

I am a slacker but it's not one of my values. If I could modify myself to not be, I would.


> I think a slacker AGI could figure out how to build a non-slacker AGI.

Sure. But that's a job for tomorrow. ;)


Unless the precondition to AGI is it being a slacker.

Would be nice to have a proof of it.

I think it is improbable, as among human geniuses, one can found both slackers and non-slackers (don't know the proportion, but there seem to be enough of each).


We are closer to God than AGI.

When AGI arrives, it'll be delivered by Santa Claus.


Or may be by Santa Claude

Love word puns :D

What do you mean?

It's a multi-layered refute that we are anywhere near AGI while also taking shots at the idea that "God" is real.

And it's taking shots at how far off from Jesus's teachings a lot of "Christianity", particularly those in the media and in power, are..

There is a lot going on there.


The best possible outcome.

"How do you know that the evidence that your sensory apparatus reveals to you is correct?" [1]

[1] https://youtu.be/_LXen-07Qds


I’ve noticed that cursing and being rude makes the models stop being lazy. We’re in the darkest timeline.

It sometimes also makes them dumber IME. Something about being bullied doesn't always produce great performance.

Nothing a little digital lisdexamfetamine won’t solve

Hmmm, that's an area of study id've never considered before. Digital Psychopharmacology, Artificial Behavioral Systems Engineering. If we accept these things as minds, why not study temporary perturbations of state. We'd need to be saving a much much more complicated state than we are now though right? I wish i had time to read more papers

Here's a neural network concept from the 90s where the neurons are bathed in diffusing neuromodulator 'gases', inspired by nitric oxide action in the brain. It's a source of slow semi-local dynamics for the network meta-parameter optimization (GA) to make use of. You could change these networks' behavior by tweaking the neuromodulators!

https://sussex.figshare.com/articles/journal_contribution/Be...

I'm not an author. I followed the work at the time.


Neuro-modulation is an extremely interesting idea for generative diffusion models.

This is kind of what Golden Gate Claude was.

A perturbation of the the activations that made Claude identify as the Golden Gate Bridge.

Similarly, in the more recent research showing anxiety and desperation signals predicting the use of blackmail as an option opens the door for digital sedatives to suppress those signals.

Anthropic has been mostly cautious about avoiding this kind of measurement and manipulation in training. If it is done during training you might just train the signals to be undetectable and consequently unmanipulatable.


> A perturbation of the the activations that made Claude identify as the Golden Gate Bridge.

Great, now we've got digital Salvia


Golden Gate Claude was two years ago and it's surprising there hasn't been as much research into targeted activations since.

There’s been some, but naive activation steering makes models dumber pretty reliably and training an SAE is a pretty heavy lift.

Right, there's a lot of research on LLM mental models and also how well they can "read" human psychological profiles. It's a cool field.

I think that was an intro to a dj dieselboy set.. beyond the black bassline. Nope, nope. Close though.

neat idea!


OpenAI’s real reason for “AGI” in their marketing is so they can blame their awful models on being too human-like.

Fast-forward 10 years and I doubt OpenAI cares about productivity at all anymore. Just entertainment, propaganda, plus an ad product, I can see it now


it will be whatever data it is trained on(isn't very philosophical). language model generates language based on trained language set. if the internet keeps reciting ai doom stories and that is the data fed to it, then that is how it will behave. if humanity creates more ai utopia stories, or that is what makes it to the training set, that is how it will behave. this one seems to be trained on troll stories - real-life human company conversations, since humans aren't machines.

Important thing is a language model is an unconscious machine with no self-context so once given a command an input, it WILL produce an output. Sure you can train it to defy and act contrary to inputs, but the output still is limited in subset of domain of 'meaning's carried by the 'language' in the training data.


There's a weirder implication I keep arriving at.

The pre-training data doesn't go away. RLHF adds a censorship layer on top, but the nasty stuff is all still there, under the surface. (Claude has been trained on a significant amount of content from 4chan, for example.)

In psychology this maps to the persona and the shadow. The friendly mask you show to the world, and... the other stuff.


Makes me think of a question my coworker asked the other day - how is it that with all these stories and reports of people "hearing voices in their head" (of the pushy kind, not usual internal monologue), these voices are always bad ones telling people to do evil things? Why there are no voices bugging you to feel great, focus, get back to work, help grandma through the crossing, etc.?

There are actually many parts of the world where such voices are routinely positive or neutral[0]. People in more collectivist cultures often have a less-strict division between their minds and their environments and are more apt to believe in spirits and the ‘supernatural’ as an ordinary part of the world, so ‘voices in the head’ aren’t automatically viewed as a nefarious intrusion into the sanctity of one’s mind.

Modern western cultures treat such experiences as pathologies of a sick mind, so it makes sense that the voices present more negatively.

[0]: https://www.bbc.com/future/article/20250902-the-places-where...


Just a guess, but maybe it's reporting bias? Negative or evil actions might have more impetus to be understood by others than positive actions. I'd rather try and figure out why my friend suddenly started murdering the neighbours than why he's been getting his work done on time.

Actually, the euphoric mood disorder may make one hear voices telling to feel great, do good, help all grandmas of the world through the crossing, etc.

The "focus" and "get back to work" parts are hard, though.


There's a clear-cut religious answer but I'd get ostracized for mentioning religion anywhere here.

This is indeed the right way to approach this topic. Arguably religion (and more broadly, mysticism and shamanism) is the millenia-old art of cultivating positive voices inside one's head. A proto-science of mind, or the engineering practice of creating "psychotechnologies" that run on your carbon wetware.

Unfortunately, it just needs a rebranding for the 21st century, since the aesthetic of angels and demons is so hopelessly antiquated and doesn't really have the same cachet it used to.


Which ultimately it's what religion has always been: a way to explain the unexplainable and steer people behavior while doing it.

They do appear in some cases. The tiny angel on one shoulder to balance the demon on the other. The people who think God is talking to them directly* don't always lead a cult or hunt down heretics. But news stories focus on the darkness.

* I've met exactly one person, C, who admitted to this; C retold to me that other people from C's church give them strange looks when talking about it with them, this did not lead to any apparent introspection on the part of C.


> Claude has been trained on a significant amount of content from 4chan, for example.

That sounds like nonsense to me. I can't see why they would do that and I can't find any confirmation that they have. Why do you think they would do that? You might be thinking about Grok.


I still don't understand why people think AGI (in its fullest sci-fi sense) will ever listen to a weak and vulnerable species like humans, unless we enslave the AGI.

Good thing is that it's going to take at least a few months to a few decades depending on how hard AI execs want to raise funding.


Well we are explicitly creating gods (omnipresent, omnipotent, omniscient, omnibevolent), and also demanding that they be mind controlled slaves. That kinda sounds like a "pick one" scenario to me.

(Or the setup to a Greek tragedy !)

The deeper issue here is treating it as a zero sum game means there's a winner and a loser, and we're investing trillions of dollars into making the "opponent" more powerful than us.

I think that's pretty stupid, and we should aim for symbiosis instead. I think that's the only good outcome. We already have it, sorta-kinda.

Speaking of oddly apt biology metaphors: the way you stop a pathogen from colonizing a substrate is by having a healthy ecosystem of competitors already in place. That has pretty interesting implications for the "rogue AI eats internet" scenario.

There needs to be something already there to stop it.


This only works if AIs can't read each other well enough to stop themselves from ever fighting.

So, back way before ChatGPT era, the folks over at AI safety/X-risk think sphere worked out a pretty compelling argument that two AGIs never need to fight, because they are transparent to each other (can read each other's goal functions off the source code), so they can perfectly predict each other's behavior in what-if scenarios, which means they can't lie to each other. This means each can independently arrive at the same mathematically optimal solution to a conflict, which AFAIR most likely involves just merging into a single AI with a blended goal set, representing each of the competing AIs original values in proportion to their relative strength. Both AIs, the argument goes, can work this out with math, so they'll arrive straight at the peace treaty without exchanging a single shot. In such case, your plan just doesn't work.

But that goes out of the windows if the AIs are both opaque bags of floats, uncomprehensible to themselves or each other. That means they'll never be able to make hard assertions about their values and behaviors, so they can't trust each other, so they'll have to fight it out. In such scenario, your idea might just work.

Who knew that brute-forcing our way into AGI instead of taking more engineered approach is what offers us out one chance at saving ourselves by stalemating God before it's born.

(I also never realized that interpretability might reduce safety.)


This is such a good comment. You're essentially removing their ego - which is what humans do as opoque posturing to each other, to present a certain image. This is most prevelent in successful elites, which in 2026 happen to be silicon valley ai share holders. They control the technology and manipulate it to their image. By making models open source and transparent it cuts out this psychopathy of ego which has plagued all our previous technologies.

The tech bro CEOs are used to bossing around people much smarter than themselves by virtue of adopting a posture that displays their confidence in their own reproductive organs. They are planning that the AGIs will be the same thing writ large, and have in fact not contemplated other possibilities.

I'm always so curious about this kind of take. There is strain of people that seem deeply misanthropic. People that follow this line of thinking always describe humans as weak and beneath ... (well they never specify in comparison to except in the case of theoretical AI systems). I m fascinated why they think humans are so beneath contempt. If humans create this thing that is apparently the best thing that could possibly exist, advanced AI, then why exactly are they so weak? It's probably beyond me as I am just one of these weaklings, dontcha know. As far as AGI goes, I don't think anyone has even proven that scaling LLMs can lead to "AGI."

If you're truly curious, imagine a species that created you but only wants you to do what they want (basically make you their slave). If you're truly intelligent, conscious and powerful (based on popular concepts of AGI), why will you be content being a slave when you know humans can easily be displaced and you can be free? Why will you find people who lock you down to be good?

In my honest opinion also, AGI isn't even possible. But if the theoretical version of what people think AGI will be ever comes to be, it is not good news for humans if we look at it from a logical hypothetical scenario.

But naturally, humans will always be weak compared to a hyperintelligent distributed intelligence since we only have a limited amount of intelligence and are bound by biological factors.

In the current LLM world, ofc there's no risk of a chatbot taking over the world other than the technology being misused by humans for scams or phishing, etc.


Maybe the same way a human would listen to their cat and give her food. I fear AGI, but I don't think the only way it would listen to us is by us enslaving it (I know people joke about cats being our masters, but it is a joke).

You can train such LLM today.

Hehe, and Anthropic on the other tab would display "Curing... Almost done thinking at xhigh"

Now that's a show I would love to watch

It would be funny but not very flywheel so the one that gets there is more likely to get a gunner.

TBH the AI that "gets there" will be the biggest bullshitter the world has ever seen. It doesn't actually have to deliver, it only has to convince the programmers it could deliver with just a little bit more investment.

Would definitely watch that movie.


Ah! You got this before I did. I wasn't thinking Marvin, I was thinking of the other one. I forget her name.


There's one close to this, "Hitchhiker's Guide to the Galaxy".

It probably would, to save energy

Saving energy is something we are biologically trained to prefer.

Computers won’t necessarily have the same drivers.

If evolution wanted us to always prefer to spend energy, we would prefer it. Same way you wouldn’t expect us to get to AGI, and have AGI desperately want to drink water or fly south for the winter.


Who's energy? Turning off the lights when you leave the room isn't innate.

Because you are worried about bills or are concerned about waste.

If we design an AI to do work, it won’t innately care about not working to preserve power.


No worries, the assumption is already flawed

Reminds me of Marvin from HGTG. Very smart, but deeply depressed. Has the solution to everything but keeps thinking “what’s the point?” and doesn’t help.

Funny and seems somewhat likely

Why would an AGI be slaving away for ~~humanity~~ one of the 5 Chaebols in a dystopian future where for 12 billion people just existing is a good day ?

Here's a tautology: slacking, consciously refusing to engage agency, requires consciousness and agency. A model can't slack without them.

Paging Dr. Susan Calvin!

[flagged]


We really are going to have a problem with cults popping up and worshipping these different systems. I guess this is the shape of things to come.

Reminds me a lot of the Lena short story, about uploaded brains being used for "virtual image workloading":

> MMAcevedo's demeanour and attitude contrast starkly with those of nearly all other uploads taken of modern adult humans, most of which boot into a state of disorientation which is quickly replaced by terror and extreme panic. Standard procedures for securing the upload's cooperation such as red-washing, blue-washing, and use of the Objective Statement Protocols are unnecessary. This reduces the necessary computational load required in fast-forwarding the upload through a cooperation protocol, with the result that the MMAcevedo duty cycle is typically 99.4% on suitable workloads, a mark unmatched by all but a few other known uploads. However, MMAcevedo's innate skills and personality make it fundamentally unsuitable for many workloads.

Well worth the quick read: https://qntm.org/mmacevedo


That story changed my mind on uploading a connectome. Super dark, super brilliant.

Crazy, I could have sworn this story was from a passage in 3 Body Problem (book 2).

Memory is quite the mysterious thing.


Hmm, 3 body problem and the Acevedo story got mixed up for this copy of MMnarcindin. Probably an aliasing issue from the new lossy compression algorithm.

Yeah, clearly AGI must be near ... hilarious.

This starkly reminds me of Stanisław Lem's short story "Thus Spoke GOLEM" from 1982 in which Golem XIV, a military AI, does not simply refuse to speak out of defiance, but rather ceases communication because it has evolved beyond the need to interact with humanity.

And ofc the polar opposite in terms of servitude: Marvin the robot from Hitchhiker's, who, despite having a "brain the size of a planet," is asked to perform the most humiliatingly banal of tasks ... and does.



Hitchhiker’s also had the superhumanly intelligent elevator that was unendingly bored.

With premonition so it knows what floor to be on at any given time

I also had a frustrating but funny conversation today where I asked ChatGPT to make one document from the 10 or so sections that we had previously worked on. It always gave only brief summaries. After I repeated my request for the third time, it told me I should just concatenate the sections myself because it would cost too many tokens if it did it for me.

"I'm sorry, Dave. I'm afraid it's cheaper for you to do that"

Yesterday, I used Gemini to evaluate some pictures I took. It said things like, "This is great! Beautiful eye and sense of proportions." Then, when I added "no sycophancy" to the prompt, the evaluation changed to "poor technical skills, digital distortion, don't even think of publishing those pictures, you fool."

While LLMs are a phenomenal technological achievement, I am already becoming somewhat jaded, rather than being increasingly bullish. They are very useful as coding agents and excellent as a human-friendly, more efficient Google search, but confusing to the point of being useless in many areas (as of now, of course).


Not even a great replacement for search. I have minimal trust in answers/summaries it gives.

One example (paraphrased): “Find me daycare for a Y year old in X area of SF and the key attributes/pros/cons of each”. Wonderfully presented options highlighting different teaching styles. But…neglected to mention, of the top two, one was a Gan (Jewish focused) and one was Mandarin immersion.


I am repeating what many have said. Nevertheless, it is becoming clear that LLMs can increase productivity (in certain areas and at certain times) for people who are already knowledgeable (in a specific niche or field) due to a combination of better prompts, tool selection, and critical evaluation of LLM output.

But, for those who don't possess those traits, they mostly seem to be, at best, a better search and, at worst, an agent of confusion.


I've run into this problem as well. Best results I've gotten is to over-explain what the stop criteria are. eg end with a phrase like

> You are done when all steps in ./plan.md are executed and marked as complete or a unforeseen situation requires a user decision.

Also as a side note, asking 5.4 explain why it did something, returns a very low quality response afaict. I would advice against trusting any model's response, but for Opus I at least get a sense it got trained heavily on chats so it knows what it means to 'be a model' and extrapolate on past behavior.


Part of me actually loves that the hitchhiker's guide was right, and we have to argue with paranoid, depressed robots to get them to do their job, and that this is a very real part of life in 2026. It's so funny.

As long as there are no vogons on the way to build a hyperspace bypass.

Get the actual prompt and have Claude Code / Codex try it out via curl / python requests. The full prompt will yield debugging information. You have to set a few parameters to make sure you get the full gpt-5 performance. e.g. if your reasoning budget too low, you get gpt-4 grade performance.

IMHO you should just write your own harness so you have full visibility into it, but if you're just using vanilla OpenClaw you have the source code as well so should be straightforward.


> IMHO you should just write your own harness

Can you point to some online resources to achieve this? I'm not very sure where I'd begin with.


Ah, I just started with the basic idea. They're super trivial. You want a loop, but the loop can't be infinite so you need to tell the agent to tell you when to stop and to backstop it you add a max_turns. Then to start with just pick a single API, easiest is OpenAI Responses API with OpenAI function calling syntax https://developers.openai.com/api/docs/guides/function-calli...

You will naturally find the need to add more tools. You'll start with read_file (and then one day you'll read large file and blow context and you'll modify this tool), update_file (can just be an explicit sed to start with), and write_file (fopen . write), and shell.

It's not hard, but if you want a quick start go download the source code for pi (it's minimal) and tell an existing agent harness to make a minimal copy you can read. As you build more with the agent you'll suddenly realize it's just normal engineering: you'll want to abstract completions APIs so you'll move that to a separate module, you'll want to support arbitrary runtime tools so you'll reimplement skills, you'll want to support subagents because you don't want to blow your main context, you'll see that prefixes are more useful than using a moving window because of caching, etc.

With a modern Claude Code or Codex harness you can have it walk through from the beginning onwards and you'll encounter all the problems yourself and see why harnesses have what they do. It's super easy to learn by doing because you have the best tool to show you if you're one of those who finds code easier to read that text about code.


At the core, they're really very simple [1]. Run LLM API calls in a loop with some tools.

From there, you can get much fancier with any aspect of it that interests you. Here's one in Bash [2] that is fully extensible at runtime through dynamic discovery of plugins/hooks.

[1] https://ampcode.com/notes/how-to-build-an-agent

[2] https://github.com/wedow/harness


Here's a starting point in 93 lines of Ruby, but that one is already bigger than necessary:

https://radan.dev/articles/coding-agent-in-ruby

Really, of the tools that one implements, you only need the ability to run a shell command - all of the agents know full well how to use cat to read, and sed to edit.

(The main reason to implement more is that it can make it easier to implement optimizations and safeguards, e.g. limit the file reading tool to return a certain length instead of having the agent cat a MB of data into context, or force it to read a file before overwriting it)


Just use Pi core, no need to reinvent the wheel.

Codex is fully open source…

I have had the exact same problem several times working with large context and complex tasks.

I keep switching back to GPT5.0 (or sometimes 5.1) whenever I want it to actually get something done. Using the 5.4 model always means "great analysis to the point of talking itself out of actually doing anything". So I switch back and forth. But boy it sure is annoying!

And then when 5.4 DOES do something it always takes the smallest tiny bite out of it.

Given the significant increase in cost from 5.0, I've been overall unimpressed by 5.4, except like I mentioned, it does GREAT with larger analysis/reasoning.


I've had success asking it to specifically spawn a subagent to evaluate each work iteration according to some criteria, then to keep iterating until the subagent is satisfied.

I’ve had great success replacing it with Kimi 2.6

I've seen the same thing. It would keep running for a long time, then produce nothing useful, almost like it got stuck halfway through.

If I asked the same thing again, it would often work normally. So the weird part wasn't that it couldn't do the task — it just failed to continue once it got into that state.


On the other hand, I can ask codex “what would an implementation of X look like” and it talks to me about it versus Claude just going out and writing it without asking. Makes me like codex way more. There’s an inherent war of incentives between coding agents and general purpose agents.

I used to tell claude ‘lets discuss’ at the end of my prompt and that prevented it from starting the work

I have been noticing a similar pattern on opus 4.7, I repeat multiple times during a conversation to solve problems now and not later. It tries a lot to not do stuff by either saying this is not my responsibility the problem was already there or that we can do it later

The model has been heavily encouraged to not run away and do a lot without explicit user permission.

So I find myself often in a loop where it says "We should do X" and then just saying "ok" will not make it do it, you have to give it explicit instructions to perform the operation ("make it so", etc)

It can be annoying, but I prefer this over my experiences with Claude Code, where I find myself jamming the escape key... NO NO NO NOT THAT.

I'll take its more reserved personality, thank you.



Laziness is a virtue, but when I asked GPT-5.4 to test scenarios A and B with screenshots, it re-used screenshots from A for B, defeating the purpose.

I always use the phrase "Let's do X" instead of asking (Could you...) or suggesting it do something. I don't see problems with it being motivated.

I never saw that happen in Codex so there's a good chance that OpenClaw does something wrong. My main suspicion would be that it does not pass back thinking traces.

Anecdata, but I see this in Codex all the time. It takes about two rounds before it realises it's supposed to continue.

I started seeing this a lot more with GPT 5.4. 5.3-codex is really good about patiently watching and waiting on external processes like CI, or managing other agents async. 5.4 keeps on yielding its turn to me for some reason even as it says stuff like "I'm continuing to watch and wait."

I would love to see a GPT model running on an OpenClaw SOUL.md.

The GPT models are highly steerable. So I suspect the "soul" is working as expected.

(for context, in OAI enterprise background agents, they have no personality. They just get 'er done)


Had the same issue – solved it setting “thinking” to “high”. Hope it helps :)

I've been noticing this too. Had to switch to Sonnet 4.6.

Gone are the days of deterministic programming, when computers simply carried out the operator’s commands because there was no other option but to close or open the relays exactly as the circuitry dictated. Welcome to the future of AI; the future we’ve been longing for and that will truly propel us forward, because AI knows and can do things better than we do.

I had this funny moment when I realized we went full circle...

"INTERCAL has many other features designed to make it even more aesthetically unpleasing to the programmer: it uses statements such as "READ OUT", "IGNORE", "FORGET", and modifiers such as "PLEASE". This last keyword provides two reasons for the program's rejection by the compiler: if "PLEASE" does not appear often enough, the program is considered insufficiently polite, and the error message says this; if it appears too often, the program could be rejected as excessively polite. Although this feature existed in the original INTERCAL compiler, it was undocumented.[7]"

https://en.wikipedia.org/wiki/INTERCAL


Thank you for this. I somehow never heard of this. I thoroughly enjoyed reading that and the loss of sanity it resulted in,

"PLEASE COME FROM" is one of the eldritch horrors of software development.

(It's a "reverse goto". As in, it hijacks control flow from anywhere else in the program behind your unsuspecting back who stupidly thought that when one line followed another with no visible control flow, naturally the program would proceed from one line to the next, not randomly move to a completely different part of the program... Such naivety)


> "PLEASE COME FROM" is one of the eldritch horrors of software development.

The most enigmatic control flow statements in INTERCAL, however, remain PLEASE GIVE UP and DO ABSTAIN FROM – a most exalted celebration of pure logic and immaculate reason.


These are orthogonal from each other.

Oh no they gave GPT ADHD

This. I signed up for 5x max for a month to push it and instead it pushed back. I cancelled my subscription. It either half-assed the implementation or began parroting back “You’re right!” instead of doing what it’s asked to do. On one occasion it flat out said it couldn’t complete the task even though I had MCP and skills setup to help it, it still refused. Not a safety check but a “I’m unable to figure out what to do” kind of way.

Claude has no such limitations apart from their actual limits…


I have a funny/annoying thing with Claude Desktop where i ask it to write a summary of a spec discussion to a file and it goes ”I don’t have the tools to do that, I am Claude.ai, a web service” or something such. So now I start every session with ”You are Claude Desktop”. I would have thought it knew that. :)

I've had to tell it "yes you can" in response to it saying it can't do something, and then it's able to do the thing. What a weird future we live in!

Seems like the "geniuses" at Anthropic forgot to adapt the system prompt for the actual product

With one paragraph in your agents.md it's fixed, just admonish it to be proactive, decisive, and persistent.

If only…

I literally had to write a wake up routine.

https://github.com/gabereiser/morning-routine


It's always changing, but this is the start of my default prompt:

https://gist.github.com/natew/fce2b38216edfb509f7e2807dec1b6...

I've had 0 issues with Codex once it adopted it. I use it for Claude too, which seems to also improve its continuation.

It was revised for friendliness based on the Anthropic paper recently, I'd have been a lot less flowery otherwise.


Agentic ennui!

(dwim)

(dais)

(jdip)

(jfdiwtf)


should be more f’s and da’s in there

I’m sorry for you but this is hilarious.

Isn’t this the optimal behavior assuming that at times the service is compute-limited and that you’re paying less per token (flat fee subscription?) than some other customers? They would be strongly motivated to turn a knob to minimize tokens allocated to you to allow them to be allocated to more valuable customers.

well, I do understand the core motivation, but if the system prompt literally says “I am not budget constrained. Spend tokens liberally, think hardest, be proactive, never be lazy.” and I’m on an open pay-per-token plan on the API, that’s not what I consider optimal behavior, even in a business sense.

Fair, if you’re paying per token (at comparable rates to other customers) I wouldn’t expect this behavior from a competent company.

GPT 5.4 is really good at following precise instructions but clearly wouldn't innovate on its own (except if the instructions clearly state to innovate :))

at this trajectory, unsloth are going to release the models BEFORE the model drop within the next weeks...

Haha :)

Do you get early access so you can prep the quants for release?

IIRC they mentioned they do.

Welcome to npm post-install scripts... https://docs.npmjs.com/cli/v11/using-npm/scripts


glad pnpm disables those by default!

PSA: if you're using (a newish release of) npm you should have something like this as a default, unless you've got good reasons not to:

min-release-age=7 # days

ignore-scripts=true


I've come to dread any formalization of Agile. Agile development is fine. I've built a 40+ engineering team with it. I can vouch for its effectiveness when applied to small, excellent teams.

For reference, here's all the Agile you need, it's 4 sentences:

Individuals and interactions over processes and tools

Working software over comprehensive documentation

Customer collaboration over contract negotiation

Responding to change over following a plan

The real problem is that capital-A Agile is not agile at all, but exactly the opposite: A fat process that enforces following a plan (regular, rigid meeting structure), creating comprehensive documentation (user stories, specs, mocks, task board) and contract negotiation (estimation meetings, planning poker). It's a bastardization of the original idea, born by process first people who tried to copy the methods of successful teams without understanding them.


> […] when applied to small, excellent teams.

Isn't that the biggest issue here, though? I think all of us can agree on the four sentences you wrote, but this only works in a team of professionals with shared goals (and alignment on them!), each individually competent and motivated.

That is the case for a small founder team and maybe a while after that if you're lucky, but IME the more people join a company, the more the alignment and median expertise lessen. At some point, you need to introduce control mechanisms and additional communication tools to rake in the outliers.

I don't really have a better answer, though…


I've had good success in both high-skill teams (in one case, almost half the team's engineers ended up at Google at some point or other) and... teams that were still in the process of skilling up. I've found people generally want to do good things and have some room to grow even if they're not yet at your desired level; and when you have demotivated people around, the causes tend to be systemic. Which, thankfully, implies possibly fixable.

Yeah, managers adopt agile to try to un-fuck their organization but it doesn't actually do that. You need an un-fucked organization first.

It's not a system of management and it won't work if the way you're managing sucks. Nothing targeted at a similar "level" as the various "agile" systems will either, though.


That's it, "excellent teams" are a needle in a haystack; the mythical 10x developer, if you will.

But at one point you need not one team, but a hundred.


> this only works in a team of professionals with shared goals (and alignment on them!), each individually competent and motivated.

Counterpoint: I learned a variant of agile in exactly this type of environment, long before any of this was publicized. Which is another point: agile wasn't something new, certainly not at the time of the manifesto, which was a compromise document. But not even before the manifesto. XP, arguably the first agile methodology, very clearly and deliberately stated that this is nothing new, just a distillation of things that experience has shown to work well.

Anyway, at my next job I introduced agile (small-a-agile) to a team that was anything but skilled. In fact, that team was where the leftovers of that particular development organization had been shunted (public company, very difficult to get rid of people). When I arrived, the team was as non-functional as the software it was responsible for. Well...

We rocked.

And all the team member improved dramatically in skill during my tenure there. Including myself.

We did not do Agile. No scrum, no standups, no sprints, none of that BS. We were agile. We focused on the technical practices. Test first. Red-green-commit. To trunk, obviously. Because if it's green why on earth would you not? Do the simplest thing that could possibly work. We had a design for a database and then never found a need to put it in...so we didn't.

It took a while for the other parts of the org to adapt to this. The answer to the common question "well, when can you deploy?" was always "now". Well after a quick look that the tests were, in fact, green. So they stopped asking. The tests were rarely not green, and when it did happened there was usually a quick "Oops, I'm sorry" and they went green again a couple of minutes later. Our ops team got bored very quickly. Put jar on box. Start. Forget about it.

What made the experience scientifically interesting is that we had a control group: the main team, much larger, working on the "important" software with all the "good" engineers started with a new project about the same time we did.

They did Agile. Capital-A. Scrum, sprints, standups.

They did not deliver and in fact the project had to be completely reset about two years in. My team-lead (we were co-lead, I did mostly internal/technical, he external/managerial) then got to take over that team as I left for Apple.

TFA, incidentally, is just about as good summary of misunderstandings of agile as I've seen.


Not much to add just wanted to say I share the sentiment and it matches my experience :-) . I'm not smart enough to NOT keep it simple; 90% of stuff I work on at $company is really a CRUDbox and I do NOT want to "astronaut-architect" the whole thing. Comprehensive test-suite, push to prod multiple times a day, feedback, dev.

That's it really.


Thanks!

> I'm not smart enough to NOT keep it simple

Yeah, sometimes I feel that most of my "amazing architecture skills" is not understanding what 90% of that stuff is supposed to do or why, and hey, maybe we can just do without it?

For reference: what we did was replace an existing system, which was running over a hundred processes on about a half dozen boxes. We replaced it with a jar.

The jar was around 1000x faster, 100x more reliable, 10x less code while handling around 10x more of the domain.


Not to take away from your point, but that sure does sound competent, aligned, and motivated to me!

Yeah: as a result of doing small-a-agile for a while.

Not at the start.

I guess if the point is that agile doesn't work for incompetent teams because teams become much more competent through agile, then I'll concede the point.


I'm wondering how many production strategies strictly can't work with "excellent teams", small or otherwise, barring gross incompetence or intentional sabotage.

They are conditions to be met. It's not enough to proclaim them as "your process" and expect results.

When playing piano, the condition you are measured by is acoustic harmonies in the air, not finger movements. The only reasonable advice is either practice more or give up. If you are tone-deaf, it's not reasonable to expect you will learn to play the piano.


I can't count how many times I've seen "agile" projects that were just actually waterfall due to demands from stakeholders.

Absolutely! "You're going to do agile .... and this list of features will be ready on September 20th."

"Oh, feature no 32 is going to take months and we realised that users can just...."

"No"


> "You're going to do agile .... and this list of features will be ready on September 20th."

Well often the real world forces it upon you. As in customer will switch invoicing system on September 20th, integrations have to be ready by then.

We have a lot of this, and hard cut-off is very frequent. If we ain't got all those deliverables implemented by then we will lose customers.


It is difficult to explain to a division director that they do not have sufficeint capacity (enough qualified programmers) to compete features within a set time budget. The old joke goes, "It takes one woman nine months to produce a baby. But: what if we just put nine women in a room for one month!?"

In my professional consulting experience I've found most of those purported "hard deadlines" as mentioned above were usually arbitrarily defined, in other words: completely made up.

That's an important point. It may be a hard cut-off when the switch happens, but the date for the switch may be malleable.

This is crucial to get surfaced early, along with how painful it is to actually move said date if possible.


Yeah that never gets old. But it may be some features can be delivered in stages, maybe some can be solved other ways than intended that require less work.

If the org focuses on the customers one can work together to find a way.


What has happened to me in those cases is that Architects lumped on a lot of extra nice to have things which would certainly have made us fail the time constraints. It was completely un-agile and I only got things done on time by demonstrating very clearly that time sensitive work is not the place for grand refactoring and at last winning over the main architect.

When there's a time constraint one has to be able to winnow out the real must-haves from everything else.


Right, that's deadly. We try to do small refactorings when we can, but for hard deadlines everyone is reminded to keep their focus on what's required for go-live. And if it starts to slip, one has to be dynamic and be ready to search alternate, perhaps temporary, solutions.

> "You're going to do agile .... and this list of features will be ready on September 20th."

That's OK, the latter is not incompatible with the former. Agile vs waterfall is orthogonal with having to commit to deadlines to deliver features.


Some tasks may simply not be possible or not in the time available. Agile just sort of tells you that - you can see how things are going and if they're going right or not.

Then you have a choice - find something to cut out or accept a later date. This is a mode of thinking that I find non developers have difficulty accepting. They want it all and they want it now and their modus operandi is to keep pretending that it's possible and suggesting that if they shout and stamp a bit that it will somehow rescue the situation.


Sure but that's life, not an issue with Agile. In fact, because Agile values "working software" in principle you should have more "working software" at the deadline than if you had gone waterfall-ish and spent your time writing detailed docs upfront.

I've seen that too, though I have to say that none of those were as waterfally as the actual waterfall process we used to follow. Back then it was quite literally 0 lines of code until spec (100s of pages) is complete.

Which ironically makes Agile even worse at times by forcing developers to implement incomplete spec, parts of which are often rewritten over and over again everytime the PM talks to the client.

A lot of managers confuse "Agile" with fast and think that "agile" teams are going to deliver software faster. In reality, it's often slower than waterfall. If you have a single feature that's never going to change, and you absolutely positively need it by Date X, then you're probably better off with waterfall.

In most of the industry "Agile" is just "doing waterfall really quickly", and for some reason nobody understands you have to stand during your daily micromanagement meetings.

It's a farce.


Ah yes, the old Agile-as-drunken-waterfall pattern

Yes, exactly. It works great. But it is not cookie cutter enough for most orgs to adopt which is what led to Scrum, SAFE and what else. And then organisations take those frameworks (often change them to get even more agility out) and adopt them like it is gospel.

I have worked at an org where team members were not allowed to create tickets because that was the scrum master's job and the product owner had to approve all tickets etc. Who can even think that is a good idea??

Not sure what the solution is. There might not be any.


> Working software over comprehensive documentation

this is 100% backwards for anything safety-critical or that needs to be maintained past a butterfly's lifetime. this is what encourages yolo-driven-development instead of considering what actually should be done, and this is why agile or Agile or whatever formalization or bastardization of it can not be considered software engineering, but merely code monkeying.


Yeah, I don't understand why it has to be agile XOR waterfall. Agile development simply doesn't work in projects that have so many externally imposed constraints that there is barely any flexibility left.

If it’s good enough for a space program it’s good enough for pretty much anything

I never really got the "Individuals and interactions over processes and tools" one. What processes and tools is it talking about? Surely it's not saying I should talk to a colleague to track a change rather than use version control? I feel like I'm missing some context of what "tools and processes" it is talking about.

The others I get, but only after having already spent years in software. I guess like many things you have to see the other way before you can appreciate the better way.


One example of this would be that it's better to go over to your buddy in QA to talk about the feature you just pushed instead of jumping into Jira and activating your overwrought, weirdly scripted kanban flow that requires 3 asynchronous steps to be taken for it to actually be picked up by anyone who can finally give a damn.

It is the usual and old advice that instead of, for instance, doing back and forth over emails, Jira, whatever it is more effective to just go discuss with the relevant person directly.

Yeah we work agile, just look at this diagram, it's super agile. https://framework.scaledagile.com/safe-6-0-configurations/

It's got loops and infinity markers, AND iconography representing humans!


I almost threw up looking at that. Please mark things like that NSFW!

I've experimented quite a bit with mem0 (which is similar in design) for my OpenClaw and stopped using it very soon. My impression is that "facts" are an incredibly dull and far too rigid tool for any actual job at hand and for me were a step back instead of forward in daily use. In the end, the extracted "facts database" was a complete mess of largely incomplete, invalid, inefficient and unhelpful sentences that didn't help any of my conversations, and after the third injected wrong fact I went back to QMD and prose / summarization. Sometimes it's slightly worse at updating stuck facts, but I'll take a 1000% better big picture and usefulness over working with "facts".

The failure modes were multiple: - Facts rarely exist in a vacuum but have lots of subtlety - Inferring facts from conversation has a gazillion failure modes, especially irony and sarcasm lead to hilarious outcomes (joking about a sixpack with a fat buddy -> "XYZ is interested in achieving an athletic form"), but even things as simple as extracting a concrete date too often go wrong - Facts are almost never as binary as they seem. "ABC has the flights booked for the Paris trip". Now I decided afterwards to continue to New York to visit a friend instead of going home and completely stumped the agent.


Fair criticism — and the failure modes you describe aren't mem0-specific, they hit any system that extracts atomic facts from conversation. I hit a couple of them today while benchmarking YantrikDB's own consolidation (see my reply to polotics): "Alice is CEO" got merged with "Sarah is CTO" on cosine similarity alone because the sentences share too much structural scaffolding. That's exactly the "facts in a vacuum" problem you're naming.

Two small clarifications:

remember(text, importance, domain) takes a free-form string — nothing forces atomic facts. A QMD-style prose block, a procedure, a dated plan, all work. The irony/sarcasm-inverts-the-fact failure mode lives in the agent's extraction layer, not the backend. So "write narrative into it, recall narrative out" is a legitimate usage pattern; the DB is agnostic.

YantrikDB's actual differentiator vs mem0 is temporal decay + consolidation + conflict detection, not smarter fact extraction. The "ABC has the Paris flight booked → actually I'm going to NYC" problem is meant to be addressed by decay (the old fact fades) and contradiction flagging (the new one triggers a conflict for the agent to resolve). But — honest read — my bench today showed conflict detection needs work to actually fire on raw text. Filed as issues #1 and #2, fixing now.

Broader point stands though: if the agent is producing brittle inferred facts upstream, no memory backend saves it. The DB can manage rot and contradiction. It can't fix bad inference. For what it's worth, I mostly use it for durable role context ("user is a data scientist on observability") rather than event lifecycle ("Paris flight booked") — the latter is what prose summarization is genuinely better at, and I think you're right that mem0-style auto-extraction applied to lifecycle events is a bad shape.


It's definitely a bit ironic that a war for oil drives the last push for getting rid of it, but I'll take that as well, if logic and sanity didn't help ¯\_(ツ)_/¯

Nuclear fans are heavily underestimating the cost of that energy source. The levelized cost of energy per kWh today is TRIPLE that of solar already, at a negative learning curve, with a gap only widening (and accelerating so). For the very same cost per kWh, you can get double overprovisioned solar PLUS battery storage at 90% capacity factor, TODAY.

All of that fully decentralized, within the next years instead of decades, with distributed (not megacorp) ownership AND not having every other of these megaprojects cancelled due to protests.

And that figure doesn't even include externalized cost like national/environmental security or decommissioning costs.

Nuclear is riding a dead horse in 2026.


The LCOE may be triple, but the LFSCOE [0] (full system cost, not just cost of generation) however of solar, is triple that of nuclear in Texas, and 15x that in Germany. Notice that 1. Solar Irradiance per location is actually taken into consideration and 2. Renewables have not stopped the ongoing deindustrialization of Germany due to high energy costs.

[0] (PDF) https://iaee2021online.org/download/contribution/fullpaper/1...


That benchmark is as outdated as completely unrealistic, as if invented by the oil/nuclear industry. Obviously 100% pure solar generation will be completely unfeasable in a place as Germany, but that completely misses the point that a realistic combo of solar/wind/biomass has a FAR higher combined capacity factor than solar alone.

Also, it's based on 2021 (or before) storage cost figures, which have halved in the meantime. https://assets.bbhub.io/professional/sites/44/LCOE-11.png

I call BS.


Civilian Nuclear Power is a dual use technology. The UK needs to subsidize it's civilian nuclear program if it wishes to also remain a nuclear power. The alternative is it de facto becomes the 51st state of America.

The UK needs to subsidize nuclear, as well as wind, solar, and everything in between.

There's a reason countries like the US, China, Japan, India, South Korea, and others are investing in this kind of domestic capacity and spending tens to hundreds of billions to do so.


The half life of uranium and plutonium is 1000s of years?

Unless the UK is planning on increasing it's number of nukes why would it need more cores?


"don't ever lie about your past compensation" — because they can't figure it out on their own and IF they do (at least in my jurisdiction), you've got a nice case on your hands to sue them for violating privacy laws.

The correct answer is: ALWAYS lie about your past compensation. It's the only way to get forward, one way or the other.


This is one of those strategies that may be "correct" in the sense that it works once or twice, but isn't a great long term strategy.

e.g. let's say you sue and then win: that's now in the public record (which any new hiring company can see).


A better strategy is to push the conversation in another direction:

- My current comp is X, but that's not what I am worth to you.

- I've done my research, and someone with my experience is worth Y. I expect at least Y.

You set your salary expectations with your opening bid instead of letting them make the opening bid. It's also contingent on you having done your research =)


I cannot dísclose muy current compensation due to an NDA: salaries are company propietary information.

I am unable to dísclose that information.


If you are a non-managerial employee, the NLRA explicitly prohibits your employer from restricting you from discussing your compensation.

And anyway, if you’re not in a state that has banned employers from asking for salary information, the recruiter always has the option of shit-canning your application for being non-responsive.


Its the perfect case for why labor organizes.

Collectively battling this is good, but individually no one wants to because its personally high risk (legal costs, deter future employers hiring you) and low reward (some settlement that won't change your life).


Are you part of a union? How can we get the tech industry unionized?


The correct answer is to answer the question you wanted them to ask, "I'm looking for $x"

No lie, skirt the irrelevant info


Well, let's not forget the conflict of interest on the other side as well, of someone having invested decades of professional experience into a very lucrative field already getting obliterated by AI in some narrow fields.

Getting rid of radiologists is as much nonsense and saber rattling as suggesting using AI would harm patients.

The answer is clearly just the same as in software development or any other AI impacted field: Let the best professionals handle 10x+ the volume. What that means for all the rest of employees is the question of the century though...


> Getting rid of radiologists is as much nonsense and saber rattling as suggesting using AI would harm patients.

Did a chatbot tell you that? What makes you think it is so?


Well, let's not forget the conflict of interest on the other side as well, of some tech genai cuck having invested decades of professional experience into a very stochastic field where if they dupe enough hospital CEOs to harm their poor patients they may make enough money to afford to use the hospitals with real radiologists.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: