I suppose I actually agree with you, and I would give the same advice to junior engineers too. I've spent my career going further down the stack than I really needed to for my job and it has paid off: everything from assembly language to database internals to details of unix syscalls to distributed consensus algorithms to how garbage collection works inside CPython. It's only useful occasionally, but when it is useful, it's for the most difficult performance problems or nasty bugs that other engineers have had trouble solving. If you're the best technical troubleshooter at your company, people do notice. And going deeper helps with system design too: distributed systems have all kinds of subtleties.
I mostly do it because it's interesting and I don't like mysteries, and that's why I'm relearning transformers, but I hope knowing LLM internals will be useful one day too.
While RLVF is neat, it still is an 'offline' learning model that just borrows a reward function similar to RL.
And did you not read the entire post? Karpathy basically calls out the same point that I am making regarding RL which "of course can be exploited to help move the needle on benchmarks":
> Related to all this is my general apathy and loss of trust in benchmarks in 2025. The core issue is that benchmarks are almost by construction verifiable environments and are therefore immediately susceptible to RLVR and weaker forms of it via synthetic data generation. In the typical benchmaxxing process, teams in LLM labs inevitably construct environments adjacent to little pockets of the embedding space occupied by benchmarks and grow jaggies to cover them. Training on the test set is a new art form
Regarding:
> I really don't know how to reply to this part without sounding insulting, so I won't.
Relevant to citing him: Karpathy has publicly praised some of my past research in LLMs, so please don't hold back your insults. A poster on HN telling me I'm "not using them right!!!" won't shake my confidence terribly. I use LLMs less this year than last year and have been much more productive. I still use them, LLMs are interesting, and very useful. I just don't understand why people have to get into hysterics trying to make them more than that.
I also agree with Karpathy's statement:
> In any case they are extremely useful and I don't think the industry has realized anywhere near 10% of their potential even at present capability.
But magical thinking around them is slowing down progress imho. Your original comment itself is evidence of this:
> I would strongly caution anyone who thinks that they will be able to understand or explain LLM behavior better by studying the architecture closely.
I would say "Rip them open! Start playing around with the internals! Mess around with sampling algorithms! Ignore the 'win market share' hype and benchmark gaming and see just what you can make these models do!" Even if restricted to just open, relatively small models, there's so much more interesting work in this space.
(I've been a cross platform numerical developer in GIS and geophysics for decades)
serious windows power users, current and former windows developers and engineers, swear by Chris Titus Tech's Windows Utility.
It's an open powershell suite collaboration by hundreds maintained by an opinionated coordinater that allows easy installation of common tools, easy setting of update behaviours, easy tweaking of telemetry and AI addons, and easy creation of custom ISO installs and images for VM application (dedicated stripped down windows OS for games or a Qubes shard)
It's got a lot of help hover tooltip's to assist in choices and avoiding suprises, you can always look to the scripts that are run if you're suspicious.
" Windows isn't that bad if you clean it out with a stiff enough broom "
That said, I'm setting my grandkids up with Bazzite decks and forcing them to work in CLI's for a lot of things to get them used to seeing things under the hood.
Valve is practically singlehandedly dragging the Linux ecosystem forward in areas that nobody else wanted to touch.
They needed Windows games to run on Linux so we got massive Proton/Wine advancements. They needed better display output for the deck and we got HDR and VRR support in wayland. They also needed smoother frame pacing and we got a scheduler that Zuck is now using to run data centers.
Its funny to think that Meta's server efficiency is being improved because Valve paid Igalia to make Elden Ring stutter less on a portable Linux PC. This is the best kind of open source trickledown.
One of the most interesting coding agents to run locally is actually OpenAI Codex, since it has the ability to run against their gpt-oss models hosted by Ollama.
Side project: A native macOS app in Swift that runs locally and uses AI to clean and organize files by moving them into the best-matching folders. No backend or accounts. https://floxtop.com
Full-time: C++ work on nearby connectivity (bluetooth) for embedded / industrial devices (factory equipment). Deep stack, hardware constraints, long lifecycles, high reliability.
Non-web work feels very different: stronger constraints, slower but deliberate releases, and bugs are much more expensive. There’s a lot of interesting software being built far away from HTTP and browsers.
Me too. When they removed the option to download books I liberated everything I had ever bought, moved to Kavita+koreader and will never buy a kindle book again.
I jailbroke both kindles. And use koreader on them which now supports progress sync with Kavita which is amazing! So I don't really lose functionality.
> In 2002, the amusement continued when a network security outfit discovered an internal document server wide open to the public internet in Microsoft's supposedly "private" network, and found, among other things, a whitepaper[0] written by the hotmail migration team explaining why unix is superior to windows.
AI hype is real, but we ought to start also examining anti-AI-hype-hype. It's become fashionable to rage against AI as a whole with about the same amount of understanding that the MBA hype edgelords have when they push AI as a cure-all, and both are a bad look.
To balm the enraged; look, I agree with you, the hype is indeed out of control. But like, let the vultures spend all their monies. Eventually the bubble itself will go the way of NFTs and we'll all be able to buy GPUs and SSDs again. Hopefully.
That said, there's an important chunk of discourse that gets shouted down and it really shouldn't. For just a moment, table the issues that come out of "AI as an Everything Replacement" and think of the new things that come out of this tech. On-demand tutors that never tire. Actually viable replacement for search. Large heterogenous datasets can now be rapidly parsed, by an individual, for specific insights. Personal dev teams at a fraction of the cost that now make it possible for people with absolutely bugfuck ideas to actually try them without worrying about wasted time or resources - we are going to see a vibrance in the world that was not there before.
It is not an unqualified or unmitigated good. Hell, I'll even grant that it may be a net negative - but I don't know either way, and I don't think anyone else does either. Not with any significant confidence. It just feels like we've skipped the part of the discussion where discourse occurs and gone right to "Pro" and "Anti" camps with knives at throats and flushed, sweaty faces.
My startup is building agents for automating pentesting. We started experimenting with Llama 3.1 last year. Pentesting with agents started getting good around Sonnet 3.5 v1.
The switch from Sonnet 4 to 4.5 was a huge step change. One of our beta testers ran our agent on a production Active Directory network with ~500 IPs and it was able to privilege escalate to DA within an hour. I've seen it one-shot scripts to exploit business logic vulnerabilities. It will slurp down JS from websites and sift through for api endpoints, then run a python server to perform client side anaysis. It understands all of the common pentesting tools with minor guard rails. When it needs an email to authenticate it will use one of those 10 minute fake email websites with curl and playwright. I am conservative about my predictions but here is what we can learn from this incident and what I think is inevitably next:
Chinese attackers used Anthropic (a hostile and expensive platform) because American SOTA is still ahead of Chinese models. Open weights is about 6-9 months behind closed SOTA. So by mid 2026 hackers will have the capability to secretly host open weight models on generic cloud hardware and relay agentic attacks through botnets to any point on the internet.
There is an arms race between the blackhats and private companies to build the best hacking agents, and we are running out of things the agent CAN'T do. The major change from Claude 4 - Claude 4.5 was the ability to avoid rate limiting and WAF during web pentests, and we think that the next step for this is AV evasion. When Claude 4.7 comes out, if it is able to effectively evade anti-virus, companies are in for a rude awakening. Just my two cents.
> The tasks in Butter-Bench were inspired by a Rick and Morty scene [21] where Rick creates a robot to pass butter. When the robot asks about its purpose and learns its function, it responds with existential dread: “What is my purpose?” “You pass butter.” “Oh my god.”
I wouldn't have got the reference if not for the paper pointing it out. I think I'm a little old to be in the R&M demographic.
> My Dad lost his Dad at the age of 34, which is no age at all in the grand scheme of things. By contrast I still have my Dad at the age of 60, which has meant an extra quarter century of guidance, support, advice, love and always being there. How lucky am I?
I lost my father when I was 30. I thought I’d been lucky because I’d had him through my “adult” life. Now I’m 40 and have a 2-year-old son, and over these past ten years I think it’s when I would have most liked to have him — when more questions came up about what he was really like as a person, beyond his role as a father. He died at 72 from lung cancer; he had been smoking since he was 13 and never went to the doctor. I guess I was lucky after all…
ImapGoose is on my radar to replace mbsync and imapnotify in my setup, but I think it's IMAP <-> maildir only, which makes sense for the intended use case of local mail
As a 60+ year old guy who has been at the same employer for the past 20 years, I thankfully evaded all that leetcode crap. My interview style has always been of the sort described -- ask in depth questions about something they have worked on.
The telltale sign of a padded resume is when the person acts like a cork -- you try to push down deeper and they always pop back up to surface level answers. Some give up the game quickly and admit that they were on the team that did the thing in question, despite their resume saying they did the thing. Other people are very practiced and unflappable and just bob back up over and over without shame.
For new engineers with say three years of experience or less, I can forgive the overstating their experience and I'll shift to asking them to describe their part of that project. But for the people with a few years on their resume, then that behavior is definitely a deal breaker.
Another line of questioning I take for people with experience are questions like: what are the tradeoffs of directed testing vs random testing? When do you have enough confidence in your testing that you say the product is ready to ship? What are some examples of where bugs made it out the door, and what was your process after that happened? What are some reasons why every part of a design might pass their unit tests, but the total design has failures?
These should be easy to discuss for anyone with experience, but there are a surprising number of people who fall flat. Similarly, I am shocked at how often I ask someone to write some code in any language they want to, even pseudocode, for a simple problem and are unable to do it. Eg, given a stream of numbers, maintain the largest two numbers seen so far.
I'm cautiously optimistic that Stalwart is a game-changer. I would normally stick with a simpler platform (like mox) if I wasn't interested in the groupware space. I've previously tried Nextcloud and SoGo and left disappointed, and have more or less been waiting for a project like this to come along.
Nextcloud was such a terrible experience for me (the file sharing/storage was good, but the groupware aspect was incredibly buggy). But knowing that Nextcloud is partnering with Stalwart to hopefully overhaul their stack, Opencloud is developing their JMAP integration, and Mozilla/Thunderbird is using it too (they already have a webmail in development here: https://github.com/thunderbird/stormbox)... we might finally see some exciting development in this space. And now is also a ripe time, as there seems to be a perfect storm of people wanting to get away from Big Tech platforms.
This should be pretty straightforward to do with an IMAP <-> IMAP syncing tool, like mbsync [1]. You'd run it periodically in the background to sync the remote IMAP to Stalwart's local IMAP server, and Stalwart can then automatically serve that via JMAP, doing the translation internally.
I was originally thinking you'd need to go remote IMAP <-> maildir <-> Stalwart IMAP, which would be really complicated, but I think the IMAP <-> IMAP should work fine.
We're building a unified platform for credit and financial data to power financial inclusion globally. Our mission is to unlock economic opportunities for consumers historically excluded from traditional credit systems by transforming diverse data into actionable risk insights. We are YC company backed by investors from Canapi Ventures, Kleiner Perkins, General Catalyst, and Index Ventures
-> Hiring software engineers from Mid-Level to Staff
You'll work on enterprise-grade APIs, data integrations, advanced analytics, and seamless user experiences. We're looking for engineers who thrive in fast-paced environments and want to make a meaningful impact on financial inclusion.
PostHog helps engineers build better products by combining product analytics, feature flags, session replay, a data warehouse, CDP and many more.
* we have a public handbook (posthog.com/handbook) if you want to learn how we work, pay and more in complete detail.
* we are growing through more autonomy and transparency not through process.
* we have a ton of scale and a bunch of super interesting technical problems to solve
* we're building 20 more products over the next couple of years, so you could end up building one of those
* we need: product engineers, developer who loves writing and a developer who can do marketing, backend engineers, AI product engineers, technical account managers and technical customer success managers.
Scale AI | Product Security Engineer | TypeScript, Node, Python, AWS | Full-time | Hybrid in San Francisco, CA or NYC or Remote
This is for a team that I'm working closely with and helping grow. We're hiring for somebody with a hybrid SWE and Security background to help scale up the team.
It's currently one engineer who's got a ton on his plate, so the ideal person for this role is somebody who's interested in learning, digging in deep to fix issues (especially shipping PRs), and helping shape the future roadmap of the team. It is not an analyst role.
The role is primarily targeted at mid-career folks with a few years of experience (2+ years minimum). Those that are Senior/Staff level folks, especially those with a strong SWE background that are curious about security, are encouraged to apply if this role resonates strongly. (I personally transitioned from SWE -> ProdSec, years ago, and several other folks on the broader Security team have non-Security backgrounds too.)
Feel free to apply there, email me with you resume (on my profile), or add me on LinkedIn[0]. I'll try my best to answer questions and reply to everybody, but sometimes there is so much inbound that it isn't possible. Thanks!
JWP Connatix is the most comprehensive independent video technology and monetization platform, helping broadcasters, publishers, and advertisers deliver premium streaming and online video experiences while maximizing video revenue across all screens. The company offers an end-to-end platform that streamlines live and on-demand video with hybrid monetization models, unique data and insights, unmatched customer service, and the largest independent premium video marketplace, providing the entire media ecosystem with enhanced scale, transparency, and revenue.
We are looking for a skilled and adaptable AI Engineer to join our AI Proof of Concepts team at JWP Connatix. You'll be responsible for implementing AI-First development methodologies, integrating sophisticated AI tools into our software pipeline, and rapidly building MVP prototypes that demonstrate innovative solutions. This role offers the opportunity to work at the cutting edge of AI-integrated development while delivering high-impact prototypes in a fast-paced, iterative environment. The ideal candidate thrives in rapid prototyping environments, has hands-on experience with AI tool integration, and enjoys the challenge of quickly turning concepts into working demonstrations for stakeholder validation. Candidates should also have know and work with AI code generation tools (Claude Code, Cursor, Copilot, etc).
If this sounds like something you'd be interested in please apply!
Chief Product Officer of Fastmail here. I see a lot of comments here from people that don't appear to have actually tried using the app, which is a little disappointing; don't knock it 'til you've tried it! Happy to answer any questions, but to answer the main ones that are popping up:
# Why Electron?
Because it lets us build an app that works well across all major platforms with the resources we have available. Building an email/contacts/calendar app is a huge undertaking. Doing it from scratch on each platform is just not feasible for us.
With Electron, we can maintain a single code base across all platforms so we can move faster, and keep feature parity everywhere. More than that though, we believe it lets us build a really great experience on each of these platforms, while offering a consistent UI for our customers across all their devices. Honestly, we can never out-native Apple because by definition whatever they do is "native", even if it sucks (Liquid Glass on the Mac is … not great UX). If that's your primary consideration, you will always be better with Apple's own Mail app, so it's pointless us trying to build something in that space. (And instead we work to also make Fastmail the best service to use Mail.app with — which we believe it is!)
# Why would you use this instead of the webmail?
If you prefer to keep Fastmail in your browser, great! You can do so. But we hear from many customers that they would rather not have their email mixed in with their tabs. With a separate app you can see it in the dock, Cmd-tab to it, make it your default email app system wide etc. It also lets us integrate with the system, like the Mac menu bar and native context menus.
# Why would you use this instead of an IMAP client?
If you've ever used the Fastmail web interface you probably already know the answer, but for everyone else…
1. It's a lot faster. Compared to Apple's Mail.app for example (which is a good IMAP client!):
- It resyncs way faster when you open the app, and uses a lot less data (JMAP is so much more efficient).
- Moving between messages is quicker. With Mail.app there's often a slight lag between clicking a message and it rendering. In Fastmail, it's usually instant.
2. It's more powerful. We provide the best standards support out there, and are also working to make the standards better. But there's always going to be more that we can do when we control both the server and the client. With the Fastmail UI you can:
- Add private memos to emails
- Mute conversations to ignore replies
- Pin important messages to the top of your inbox
- Schedule messages to send in the future (and not need your laptop to be online then for it to work)
- See related emails when you open your contacts.
- Add events straight into your calendar
- And much more (https://www.fastmail.com/features/).
3. It's got much better search. (Yeah, this is kind-of just "more powerful", but I'm calling it out because search sucks in most email clients0.
# And finally…
This is just a choice. We hope this is something that some of our customers will love, but we're not backing away from our commitment to open standards and encourage everyone to find what works best for them.
If you want a fast small (6 MB vs 318 MB) Fastmail client, use FMail3 (https://fmail3.appmac.fr). More options than this Electron shell around a web page.
Use tabs, windows, Hook and much more. JMAP API handles real quick notifications.
Have a look.
I am already using fmail3 [1] and before that fmail2 which is also a web wrapper but feels more native to mac than Electron apps. And I think it is written in swift. So I don't know why fastmail cannot do something similar after all these years.
The older I get, the worse I find the experience. I've had so many poor experiences with recruiters over the years, I think I'm becoming allergic.
It's getting harder to pierce through the BS layers with all that new meat on the market, and to make the matter worse, recruiters are even less skilled than they ever were and are often offshored now. It's insane today.
When I'm on the hiring side, we can't find candidates, and on the other side I can't get through to the right people.
My advice is put out feelers with anyone you've had a good relationship with in the past, often via your old networks and ex-colleague, you'll jump in front of the queue and avoid the pre-screening nonsense. They know what to expect from you and they would prefer to have a familiar face they can rely on in their internal struggles.
That's how I've landed my last 2 jobs without an interview.
The flip-side is always to be helpful to other colleagues. At some point, everyone needs a hand - be that guy - that lends it freely. They'll always look out for you in the future if you look out for them in the present. Become a knowledge source in the company and industry. Soak in as much as you can, become a reference, expose yourself to everyone's job to some degree, providing it isn't a dead zone of silos and the people feel right (not cagey). HTH.
This has been very frustrating. Another issue I often have is iOS doubling a word when I only typed it once.
Anecdotally, this seemed to start around the time Apple released the auto-complete bar above the keyboard. Normal typing got notable worse with that update and has seemed to get worse over time after that. I found it easier to type on the 1.0 software than I do today.
I find myself using voice to enter text more these days, which is not my preferred method. It’s just what happens when I get so fed up with keyboard issues.
https://news.ycombinator.com/item?id=35990118