Hacker Newsnew | past | comments | ask | show | jobs | submit | jcims's commentslogin

Still with pretty low latency (25-35ms) as well (similar to the Standby (aka pause) state you can put the account into for $5/mo)

Mind if I ask what models you’re using for CTF? I got out of the game about ten years ago and have been recently thinking about doing my toes back in.

Yep -- one fun experiment early in the video is showing sonnet 4.5 -> opus 4.5 gave a 20% lift

We do a bit of model-per-task, like most calls are sending targeted & limited context fetches into faster higher-tier models (frontier but no heavy reasoning tokens), and occasional larger data dumps (logs/dataframes) sent into faster-and-cheaper models. Commercially, we're steering folks right now more to openai / azure openai models, but that's not at all inherent. OpenAI, Claude, and Gemini can all be made to perform well here using what the talk goes over.

Some of the discussion earlyish in the talk and Q&A after is on making OSS models production-grade for these kinds of investigation tasks. I find them fun to learn on and encourage homelab experiments, and for copilots, you can get mileage. For more heavy production efforts, I typically do not recommend them for most teams at this time for quality, speed, practicality, and budget reasons if they have the option to go with frontier models. However, some bigger shops are doing it, and I'd be happy to chat how we're approaching quality/speed/cost there (and we're looking for partners on making this easier for everyone!)


Nice! Thank you!

I just did an experiment yesterday with Opus 4.5 just operating in agent mode in vscode copilot. Handed it a live STS session for AWS to see if it could help us troubleshoot an issue. It was pretty remarkable seeing it chop down the problem space and arrive at an accurate answer in just a few mins.

I'll definitely check out the video later. Thanks!


As multi-step reasoning and tool use expand, they effectively become distinct actors in the threat model. We have no idea how many different ways the alignment of models can be influenced by the context (the anthropic paper on subliminal learning [1] was a bit eye opening in this regard) and subsequently have no deterministic way to protect it.

1 - https://alignment.anthropic.com/2025/subliminal-learning/


I’d argue they’re only distinct actors in the threat model as far as where they sit (within which perimeters), not in terms of how they behave.

We already have another actor in the threat model that behaves equivalently as far as determinism/threat risk is concerned: human users.

Issue is, a lot of LLM security work assumes they function like programs. They don’t. They function like humans, but run where programs run.


Most of this physical infrastructure is trivially identifiable in Google Maps.

Whether or not we work at the same place, we work at the same place.

LLMs have allowed me to start using jq for more than pretty printing JSON.

Give https://rcl-lang.org/#intuitive-json-queries a try! It can fill a similar role, but the syntax is very similar to Python/TypeScript/Rust, so you don’t need an LLM to write the query for you.

Nice! Thanks!

The issue isn’t jq’s syntax. It’s that I already use other tools that fill that niche and have done since as long as jq has been a thing. And frankly, I personally believe the other tools are superior so I don’t want to fallback to jq just because someone on HN tells me to.

It feels like there are several conversations happening that sound the same but are actually quite different.

One of them is whether or not large models are useful and/or becoming more useful over time. (To me, clearly the answer is yes)

The other is whether or not they live up to the hype. (To me, clearly the answer is no)

There are other skirmishes around capability for novelty, their role in the economy, their impact on human cognition, if/when AGI might happen and the overall impact to the largely tech-oriented community on HN.


My thinking as well.

How could something so remarkably stable and functionally indistinguishable among its peers also be so complex?


Yeah it's a great question. I don't know the answer, but I suspect the people who study it strongly suspect that it is highly complex in this sense. Otherwise they would be looking for simpler representations instead of running massive simulations.

To your question, I think there is an elegant answer actually; most composite particles in QCD are unstable. They're either made out of equal parts matter and antimatter (like pions) or they're heavier than the proton, in which case they can decay into one (or more) protons (or antiprotons). If any of the internal complexities of the proton made it distinguishable from other protons, they wouldn't both be protons, and one could decay into the other. Quantum mechanics also helps to keep things simple by forcing the various properties of bound states to be quantized; there isn't a version of a proton where e.g. one of the quarks has a little more energy, similar to how the energies atomic orbitals are quantized.


I had the same thoughts reading this. I think there’s an optimal blend of blurters and thinkers, one isn’t better than the other. I find that I do both, it just kind of depends on my comfort with the subject matter.


This is one of those areas where family starts to influence decisions. My wife and I had kids between 24 and 28. From that point forward, 'supporting the family' took priority over personal fulfillment.

Now that our kids are grown and self-supporting, it's wild how much simpler the risk calculation is. But at 52 with engineering manager being the dominant role in my CV, not particularly appealing to the small companies making big moves that I'm interested in.


Nobody with a family has ever found a way to make a sacrifice and leave a job they felt was wasting their life?


What use is extrapolating what I said to its extreme end? In 1999 I quit my full time secure job to start a company. My wife was a stay at home mom, we had a new baby and a new mortgage for a house that was still being built. Yes it's possible, but counterexamples don't mean it's not a factor.


Your example is as much a counter example as mine is…



Well-said


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: