> The developer is rarely the person pitching the feature, and is normally given the constraints and the PRD
This heavily depends on the industry and company culture.
I've pitched plenty of features and I've basically never had a spec land on my desk ready to go. Part of my job as a SWE is to help product folks decide what to build.
I don't know. To me, this is a human problem. Not only has the model access to the production database, they have the backups online on the same volume, have an offline backup 3 month old. This is an accumulation of bad practices, all of them human design failures. Instead of sitting down and rethinking their entire backup strategy they go public on twitter and blame a probabilistic machine doing what is within its parameters to do. I bet, even that failure could have been avoided, were more care given to what they do.
No, this is a "being stupid enough to trust an LLM" problem. They are not trustworthy, and you must not ever let them take automated actions. Anyone who does that is irresponsible and will sooner or later learn the error of their ways, as this person did.
More-so an environment problem. An agent doing staging or development tasks should never be able to get access to prod API credentials, period. Agents which do have access to prod should have their every interaction with the outside world audited by a human.
I built a community tool for exactly this, based on privacy first principals but around the what. It’s workflow based and not “put your sensitive data into ChatGPT and hope it captures the right stuff”. Mostly built for security folks but anyone can use it
I have been seeing this messaging everywhere and I have not noticed this. I have had the inverse with 4.7 over 4.6.
I think people aren’t reading the system cards when they come out. They explicitly explain your workflow needs to change. They added more levels of effort and I see no mention of that in this post.
Did y’all forget Opus 4? That was not that long ago that Claude was essentially unusable then. We are peak wizardry right now and no one is talking positively. It’s all doom and gloom around here these days.
> They explicitly explain your workflow needs to change
How about - don't break my workflow unless the change is meaningful?
While we're at it, either make y in x.y mean "groundbreaking", or "essentially same, but slightly better under some conditions". The former justifies workflow adjustments, the latter doesn't.
I have used nothing but Sonnet and composer for a year and they work fine. LLMs were certainly not unusable before and Opus is certainly not necessary, especially considering the cost. People get excited by new records on benchmarks but for most day to day work the existing models are sufficient and far more efficient.
Couldn’t agree more with this sentiment. Bootstrapping leaning AI-first means I can iterate a few arch designs and actually test it, stress it, and rip it out if needed, within a few hours.
The technical debt problem is real, but as long as after a large session across 10-12 repos over a couple days, I can do a sweep for loose ends and kill dead code that we had for an old implementation. It’s less about building a piece, and more like building version 1 of a feature and then building version 2 a week later instead of 6 months later.
Building Cabreza Command (https://cabreza.com/product). Most critical infrastructure orgs manage their OT security program across SharePoint folders, Excel trackers, and slide decks that get updated once a year. The actual state of the program lives in someone’s head.
Command replaces that with a platform that maps to their real sites, real assets, and real operational constraints, so they can actually run the program, not just document it.
Consulting firms use it to deliver more engagements with the same team. Asset owners use it to keep the program alive between engagements, or run one themselves.
I love this and it’s very clean. Only piece of feedback, put the play again button where the scores are as that is where the thumb sits on the phone to do the tapping. Mind numbingly groovy
I am not sure about the “build it yourself is expensive” argument. If you were non-technical, you probably shouldn’t try to build it yourself, but for technical folks, that’s not a stretch.
Fair point — a technical person can absolutely wire up pre-signed URLs and a basic UI. The target here is more the IT lead who gets the ticket from finance or legal: “we need access to these files on S3.” They could build it, but it’s a week of work, then ongoing maintenance, auth, permissions, audit logging, SSO… BucketDrive is a CloudFormation stack that gives them all of that in 10 minutes, running serverless in their own account. The “build vs buy” math changes when the person asking isn’t the one building.
I didn’t understand the hype on Medvi in the first place. I don’t think this type of credit can be given to “two people” when it actually makes money by being a middle-man service, selling other peoples skills, and other company’s products.
Also, let’s not forget. The developer is rarely the person pitching the feature, and is normally given the constraints and the PRD…
Soooo people can keep tiptapping on the keyboard, but eventually they need to open their mind to the possibility that “the old way” is actually dead.
reply