Hacker Newsnew | past | comments | ask | show | jobs | submit | aizk's commentslogin

How powerful will Opus become before they decide to not release it publicly like Mythos?

They are planning to release a Mythos-class model (from the initial announcement), but they won't until they can trust their safeguards + the software ecosystem has been sufficiently patched.

It seems they nerf it, then release a new version with previous power. So they can do this forever without actually making another step function model release.

Sounds backwards to me.

Microsoft is collapsing under the weight of their own bloat.

Probably right now because they're keeping it for themselves?

I've been messing with using Claude, Codex, and Kimi even for reverse engineering at https://decomp.dev/ it's a ton of fun. Great because matching bytes is a scoring function that's easy for the models to understand and make progress on.

I want to get into RE with AI. Which model you liking the most?

How do you guys manage regressions as a whole with every new model update? A massive test set of e2e problem solving seeing how the models compare?

A mix of evals and vibes.

"Evals and vibes" can I put that on a t shirt?

What's that ratio exactly

Are you doing any Digital Twin testing or simulations? I imagine you can't test a product like Claude Code using traditional means.

Remember when they shipped that version that didn't actually start/ run? At work we were goofing on them a bit, until I said "Wait how did their tests even run on that?" And we realized whatever their CI/CD process is, it wasn't at the time running on the actual release binary... I can imagine their variation on how most engineers think about CI/CD probably is indicative of some other patterns (or lack of traditional patterns)

As someone that used to work on Windows, I kind of had a vision of a similar in scope e2e testing harness, similar to Windows Vista/ 7 (knowing about bugs/ issues doesn't mean you can necessarily fix them ... hence Vista then 7) - and that Anthropic must provide some Enterprise guarantee backed by this testing matrix I imagined must exist - long way of saying, I think they might just YOLO regressions by constantly updating their testing/ acceptance criteria.

Why not provide pinable versions or something? This episode and wasted 2 months of suboptimal productivity hits on the absurdity of constantly changing the user/ system prompt and doing so much of the R&D and feature development at two brittle prompts with unclear interplay. And so until there’s like a compostable system/user prompt framework they reliably develop tests against, I personally would prefer pegged selectable versions. But each version probably has like known critical bugs they’re dancing around so there is no version they’d feel comfortable making a pegged stable release..


That was actually an interesting case of things that CI/CD don't tend to catch.

It failed to start because it failed to parse the published release notes.

In the CI/CD system it would have passed, because the release notes that broke it, hadn't been published yet.

Those release notes also took down previous versions of claude-code too, rolling back didn't help users.

The breakage wasn't a change in the software, it was a change in the release notes which coincided with the change in the software.

Now, should it have been grabbing release notes and parsing them? No, that's unbelievably dumb (and potentially dangerous), but it wasn't an issue with missing CI/CD, but an interesting case-study in CI/CD gaps and how CI/CD can actually lead to over-confidence.


about once a week I get a claude "auto update" that fails to start with some bun error on our linux machines. It's beyond laughable.

I use a self-documenting recursive workflow: https://github.com/doubleuuser/rlm-workflow

This has happened before. It was called anon kode.


In light of these nonstop supply chain attacks: Tonight I created /supply-chain-audit -- A simple claude code skill that fetches info on the latest major package vulnerability, then scans your entire ~/ and gives you a report on all your projects.

https://github.com/IsaacGemal/claude-skills

It's a bit janky right now but I'd be interested to hear what people think about it.


Skills are great attack vector as well.


That sounds terrifying. Stay out of my ~/ thank you very much.


Definitely there's some truth to it


I have a story with Benji.

Last year I went viral, and Benji was the first person to interview me. It was a really cool experience, we chatted via Twitter dms, and he wrote a piece about my work - overall did a decent job.

Then, 6 months later a separate project I was adjacent to was starting to pick up steam. I reached out to him asking if he wanted to cover us. No response.

Then, tech crunch wrote an article on our project.

I reached to Benji again saying "Hey would you like to chat again, now we have some coverage?" And he finally responded, but said he couldn't report on me because he had a directive that he could only report on things that didn't have any prior or pre-existing coverage (?)

I thought that was rather strange, especially since we already had built up a relationship.

I don't really have a moral or lesson to this story, other than that journalism can be rather opaque sometimes.

Oh one other tip for anyone reading this - if you do ever get reached out to by journalists, communicate in writing, not a phone call so you can be VERY precise in your wordings.


> Then, 6 months later a separate project I was adjacent to was starting to pick up steam. I reached out to him asking if he wanted to cover us. No response. [...]

> I reached to Benji again saying "Hey would you like to chat again, now we have some coverage?" And he finally responded, but said he couldn't report on me because he had a directive that he could only report on things that didn't have any prior or pre-existing coverage (?)

> I thought that was rather strange, especially since we already had built up a relationship.

The US mentality might be different, but at least having grown up and living in Germany, such an annoying hustler who wants to use some journalist as a marketing influencer for his private project is a huge no-no. In other words: it is a very reasonable decision (perhaps even the only right one) for any journalist to fob off such a hustler.


That is the US mentality too outside of a small but persistent bubble of hustlers, supported by their symbiotic relationships with publications that need them just as much.


If simply saying "Hi, would you be interested in covering this?" characterizes me as an annoying hustler then you know what I'll take it.


I'm not hustling enough to be honest.


>The US mentality might be different, but at least having grown up and living in Germany, such an annoying hustler who wants to use some journalist as a marketing influencer for his private project is a huge no-no. In other words: it is a very reasonable decision (perhaps even the only right one) for any journalist to fob off such a hustler.

Yeah there seems to be a thing where in the US, what's seen as "selling yourself" or "putting your best foot forward" is considered excessive self-promotion / tall poppy behavior in other cultures.


> Yeah there seems to be a thing where in the US, what's seen as "selling yourself" or "putting your best foot forward" is considered excessive self-promotion / tall poppy behavior in other cultures.

It is a uniquely US thing & is a common struggle for foreigners who are new to US corporate culture.

Can be especially tricky if you are a 3rd culture individual that has to manage relationships spanning different cultures in your daily life. You can't easily turn "hustler" mode off and on.

It is a huge faux pas in almost every non-western culture and can wreak havoc in your personal life.


Slightly off topic:

Why is excessive self-promotion considered "putting your best foot forward"?

I understand that you need the money, so you do self-promotion. But this is clearly not "putting your best foot forward", but a "put a bad foot (annoy other people by excessive self-promotion) forward because you need the money", i.e. what many US-Americans do is by my understanding the opposite of this life advice which they give.


You're coming off as clearly not understanding the other side here. Obviously "putting your best foot forward" is not simultaneously "annoy other people by excessive self-promotion" in the mind of a single person.

There are two different types of people, and they think of the same action in two different ways.


I could equally well ask why putting your best foot forward would be considered excessive self-promotion. Consider the example of contacting a journalist. Why would it be a huge no-no? Why can't the journalist just treat it as any other lead? Skim the email, if they're not interested, ignore or delete. That's not a significant burden. If they are interested, such emails actually help the journalist do their job, by providing ideas for stories.


I'm a journalist. As a general rule, if someone approaches me with a pitch for a feature or investigation (not news piece) that was already published elsewhere, I'll turn it down. To be fair, I turn down all PR pitches, but there are journalists who don't but still want an exclusive.

It sometimes happens that you spend weeks or months working on a story, only to be scooped by another publication. It sucks, especially if you think your story is the better one, but unless you can pivot or add a substantial amount of new insight, it won't come out.


Sometimes people get busy and overwhelmed, but they don't know how to say no.


I know a lot of people that don't get through their email every week, for example. Even saying no takes too much time, with the volume of communication required by daily work.


Very few people email me except for endless newsletters that I accidentally signed up for. I try to un sub to a few every day but it seems never ending.

In the event that you actually do end up emailing me, it's contingent on me actually checking my personal email, which I never do when I'm not working, and only sometimes do during work hours.

If it's you asking me a favor that I'm not in the mental space for, I'll mark the message unread as a reminder to get to it later.

Maybe I just have weird email habits, but I can get away with this because email is not a heavy part of my job.

That being said, one guy was pitching me on something several times a month for several months. I just recently responded to him and apologized because of x y z. He said don't worry and we had a fruitful conversation later.

So, follow through is important!


Their repeat emailers might win eventually!

Passing on some life advice to anyone who’d benefit, people are busy. Maybe they didn’t respond because you’re annoying?… no no, feel it out and text again a while later. Give them another shot, get to the top of their inbox or messages again.

After someone told me that I realized it’s true!


This is an experience I've had with reporters multiple times. They don't like to write about the same thing twice.


My hunch is Ars will copy/reword/repost articles from real news sources (basically free for Ars) or do its own reporting for exclusive stories (costs reporters some time). No reason for Ars to spend reporter time on something they can copy.


[flagged]


They have a website, a twitter handle, and a GitHub profile with their real name.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: