Hacker Newsnew | past | comments | ask | show | jobs | submit | manofmanysmiles's commentslogin

I'd like to see this with a proper local "instruction cache."

It might even be fun that the first call generates python (or other langauge), and then subsequent calls go through it. This "otpimized" or "compiled" natural langauge is "LLMJitted" into python. With interesting tooling, you could then click on the implementation and see the generated cod, a bit like looking at the generated asssembly. Usually you'd just write in some hybrid pytnon + natural language, but have the ability to look deeper.

I can also imagine some additional tooling that keeps track of good implementations of ideas that have been validated. This could extend to the community. Package manager. Through in TRL + web of tust and... this could be wild.

Really tricky functions that the LLM can't solve could be delegated back for human implementation.


Nice! I can almost see your vision. In terms of tooling, I think this could be integrated with deep instrumentation (a-la datadog) and used to create self-improving systems.

[dead]


I'm wondering if the post-condition checks change the perspective on this at all, because yes the code is nondeterministic and may execute differently each time. That is the problem this is trying to solve. You define these validation rules and they are deterministic post-condition checks that retry until the validation passes (up to a max retry number). So even if the model changes, and the behavior of that model changes, the post-condition checks should theoretically catch that drift and correct the behavior until it fits the required output.

I'm working on this. It's wild.

Or "institution", or "legal system", or "government."

To some extent, yes. Government in particular. Both of them "close the loop" in the sense that they are self-sustaining (corporations through revenue, governments through taxes). Some institutions can be self-sustaining, but many lack strong independent feedback loops. Legal systems are pretty much all dependent on a parent government, or very large corporate entities (think big multi-year contracts).

Oligarchy (, Iron Law of)

I'd propose using our current view of physical reality to own a subset of the UIID + version field if new physics is discovered.

10-20 bits: version/epoch

10-20 bits: cosmic region

40 bits: galaxy ID

40 bits: stellar/planetary address

64 bits: local timestamp

This avoids the potentially pathological long chain of provenance, and also encodes coordinates into it.

Every billion years or so it probably makes sense to re-partion.


As for coordinates, don’t forget galaxies are clouds of stars flowing around and interacting with each other.

That's the problem with address type of systems is that they expect the object at that location to always be at that location. How do you encode the orbital speed, radius of orbit for not just the object, but also the object it is orbiting will need the same info as it is also in motion, then that object's parent galaxy's motion. Ugh, now I need a nap to calm down a bit.

You could estimate when the object was labelled by the coordinates used.

But where is the Greenwich meridian for the Milky Way?


offset length

  00     04:    Version + Flags
  04     08:    Timestamp (uint64)
  12     16:    Node/Agent Hash
  28     16:    Namespace Hash
  44     32:    Random Entropy
  76     20:    Extra / Extension
  96     32:    Integrity Hash
Total: 128bytes

I love the idea of chat.md.

I'm developing a personal text editor with vim keybindings and paused work because I couldn't think of a good interface that felt right. This could be it.

I think I'll update my editor to do something like this but with intelligent "collapsing" of extra text to reduce visual noise.


Cool! Please share your work if possible!

I couldn't decide on folding and reducing noise so I'm stuck on that front. I believe there is some elegant solution that I'm missing, hope to see your take.


I've found LLMs (since Opus 4.5) exceptionally good at reading and writing and debugging assembly.

Give them gdb/lldb and have your mind blown!


Do you mean gdb batch mode (which I've heard of others using with LLMs), or the LLM using gdb interactively ?


I wrote a wrapper python script debug.py that will run gdb as a subprocess, and then takes input from the args.

Usage is somewhat like:

$ debug.py start $ debug.py -c "break main" $ debug.py -c "continue"

Cursor at least doesn't seem to like running interactive programs yet.


I don't think there is anything technically interesting.

I think it's socially interesting that people are interested in this. If these agents start using their limbs (e.g. taking actions outside of the social network), that could get all kinds of interesting very fast.


One of my favorite blog posts. I enjoy it every time I read it. I've implemented two C package managers and they... were fine. I think it's a pretty genuinely hard thing to get right outside of a niche.

I've written two C package managers in my life. The most recent one is mildly better than the first from a decade ago, but still not quite right. If I ever build one I think is good enough I'll share, only to mostly likely learn about 50 edge cases I didn't think of :)


I would love to ask the author: are you sure that large language models are only modeling language?


Whatever gets predicted by tokens gets summarized by symbols, which are artifacts of language. This gets to the illusory aspects of binary as well, the rabbit hole goes deep.


I haven't shouted into the void for a while. Today is as good a day as any other to do so.

I feel extremely disempowered that these coding sessions are effectively black box, and non-reproducible. It feels like I am coding with nothing but hopes and dreams, and the connection between my will and the patterns of energy is so tenuous I almost don't feel like touching a computer again.

A lack of determinism comes from many places, but primarily: 1) The models change 2) The models are not deterministic 3) The history of tool use and chat input is not availabler as a first class artifact for use.

I would love to see a tool that logs the full history of all agents that sculpt a codebase, including the inputs to tools, tool versions and any other sources of enetropy. Logging the seed into the RNGs that trigger LLM output would be the final piece that would give me confidence to consider using these tools seriously.

I write this now after what I am calling "AI disillusionment", a feel where I feel so disconnected from my codebase I'd rather just delete it than continue.

Having a set of breadcrumbs would give me at least a modicum of confidence that the work was reproducible and no the product of some modern ghost, completely detached from my will.

Of course this would require actually owning the full LLM.


> A lack of determinism comes from many places, but primarily: 1) The models change 2) The models are not deterministic...

models themselves are deterministic, this is a huge pet peeve of mine, so excuse the tangent, but the appearance of nondeterminism comes from a few sources, but imho can be largely attributed to the probabilistic methods used to get appropriate context and enable timely responses. here's an example of what I mean, a 52-card deck. The deck order is fixed once you shuffle it. Drawing "at random" is a probabilistic procedure on top of that fixed state. We do not call the deck probabilistic. We call the draw probabilistic. Another exmaple, a pot of water heating on a stove. Its temperature follows deterministic physics. A cheap thermometer adds noisy, random error to each reading. We do not call the water probabilistic. We call the measurement probabilistic.

Theoretical physicists run into such problems, albeit far more complicated, and the concept for how they deal with them is called ergodicity. The models at the root of LLM's do exhibit ergodic behavior; the time average and the ensemble average of an observable are identical, i.e. the average response of a single model over a long duration and the average of many similar models at a fixed moment are equivalent.


The previous poster is correct for a very slightly different definition of the word "model". In context, I would even say their definition is the more correct one.

They are including the random sampler at the end of the LLM that chooses the next token. You are talking about up to, but not including, that point. But that just gives you a list of possible output tokens with values ("probabilities"), not a single choice. You can always just choose the best one, or you could add some randomness that does a weighted sample of the next token based on those values. From the user's perspective, that final sampling step is part of the overall black box that is running to give an output, and it's fair to define "the model" to include that final random step.


but, to be fair, simply calling the sampler random is what gives people the impression like what OP is complaining about. which isn't entirely accurate, it's actually fairly bounded.

this plays back into my original comment, which you have to understand to know that the sampler, for all its "randomness" should only be seeing and picking from a variety of correct answers, i.e. the sample pool should only have all the acceptable answers to "randomly" pick from. so when there are bad or nonsensical answers that are different every time, it's not because the models are too random, it's because they're dumb and need more training. tweaking your architecture isn't going to fully prevent that.


The User:

The stove keeps burning me because I can't tell how hot it is, it feels random and the indicator light it broken.

You:

The most rigorous definition of temperature is that it is equal to the inverse of the rate of change of entropy with respect to internal energy, within a given volume V and particles N held constant. All accessible microstates are equiprobable over a long period of time, this is the very definition of ergodicity! Yet, because of the flow of entropy the observed macrostates will remain stable. Thus, we can say the the responses of a given LLM are...

The User:

I'm calling the doctor, and getting a new stove with an indicator light.


Well really, the reason why I gripe about it, to use your example, is that then they believe the indicator light malfunctioning is an intrinsic feature of stoves, so they throw their stove out and start cooking over campfires instead, tried and true, predictable, whatever that means.

I think my deck of cards example still holds.

You could argue I'm being uselessly pedantic, that could totally be the case, but personally I think that's cope to avoid having to think very hard.


Here is a definite scientific nail down and solve for non-determinism in LLM outputs (Mira Murati's new outfit but really credit the author)

https://bff531bb.connectionism.pages.dev/blog/defeating-nond...


Requires a login?



It's also a pet peeve of mine, enough that I actually wrote a blog about it

https://hi-mil.es/blog/human-slop-vs-ai-slop


I share the sentiment. I would add that people I would like to see use LLMs for coding (and other technical purposes) tend to be jaded like you, and people I personally wouldn't want to see use LLMs for that, tend to be pretty enthusiastic


I've been building something like this, a markdown that tracks your prompts, and the code generated.

https://github.com/sutt/innocuous/blob/master/docs/dev-summa...

Check it out, I'd be curious of your feedback.


Maybe just take a weekend and build something by writing the code yourself. It's the feeling of pure creative power, it sounds like you've just forgotten what it was like.


Yeah, tbh I used to be a bit agentic coding tool-pilled, but over the past four months I've come to realize that if this industry evolves in a direction where I don't actually get to write code anymore, I'm just going to quit.

Code is the only good thing about the tech industry. Everything else is capitalist hellscape shareholder dystopia. Thinking on it, its hilarious that any self-respecting coder is excited about these tools, because what you're excited for is a world where, now, at best, your entire job is managing unpredictable AI agents while sitting in meetings all day to figure out what to tell your AI agents to build. You don't get to build the product you want. You don't get to build it how you want. You'll be a middle manager that gets to orchestrate the arguments between the middle manager you already had and the inflexible computer.

You don't have to participate in a future you aren't interested in. The other day my boss asked me if I could throw Cursor at some task we've had backlogged for a while. I said "for sure my dude" then I just did it myself. It took me like four hours, and my boss was very impressed with how fast Cursor was able to do it, and how high quality the code was. He loves the Cursor metrics dashboard for "lines accepted" or whatever, every time he screenshares he has that tab open, so sometimes I task it on complicated nonsense tasks then just throw away the results. Seeing the numbers go up makes him happy, which makes my life easier, so its a win-win. Our CTO is really proud of "what percentage of our code is AI written" but I'm fairly certain that even the engineers who use it in earnest actually commit, like, 5% of what Cursor generates (and many do not use it in earnest).

The sentiment shift I've observed among friends and coworkers has been insane over the past two months. Literally no one cares about it anymore. The usage is still there, but its a lot more either my situation or just a "spray and pray" situation that creates a ton of disillusioned water cooler conversations.


This pretty much sums up my experience.


If you care about this so much why don't you use one of the open source OpenAI models? They're pretty good and give you the guarantees you want.


None of the open weight models are really as good as SOTA stuff, whatever their evals says. Depending on the task at hand this might not actually manifest if the task is simple enough, but once you hit the threshold it's really obvious.


> where I feel so disconnected from my codebase I'd rather just delete it than continue.

If you allow your codebase to grow unfamiliar, even unrecognisable to you, that's on you, not the AI. Chasing some illusion of control via LLM output reproducibility won't fix the systemic problem of you integrating code that you do not understand.


Who cares about the blame, it would just be useful if the tools were better at this task in many particular ways.


It's not blame, it's useful feedback. For a large application you have to understand what different parts are doing and how everything is put together, otherwise no amount of tools will save you.


The process of writing the code, thinking all the while, is how most humans learn a codebase. Integrating alien code sequentially disrupts this process, even if you understand individual components. The solution is to methodically work through the codebase, reading, writing, and internalizing its structure, and comparing that to the known requirements. And yet, if this is always required of you as a professional, what value did the LLM add beyond speeding up your typing while delaying the required thinking?


I completely agree.


And now imagine you'd have to rely on humans to build your software instead


This is the question though isn't it?

With sufficient structure and supervision, will a "team" of agents out-perform a team of humans?

Military, automotive and other industries have developed rigorous standards consisting of among other things detailed processes for developing software.

Can there be an AI waterfall? With sufficiently unambiguous, testable requirements, and a nice scaffolding of process, is it possible to achieve the dream of managers, and eliminate software engineers? My intuition is evenly split.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: