Hacker Newsnew | past | comments | ask | show | jobs | submit | justaboutanyone's commentslogin

How does that work with multiple credit agencies?

No idea, it was back in the mid to late 90’s.

Many banks are known for pulling from a particular one.

While pinyin might be "better", there's still a lot of room for something better than it


At this point, it might be moot. Too many people are assuming it's still a closed-source thing and will dismiss it.

Due to the closed source nature, every mojo announcement I see I think "whatever, next"

If the actual intent is to open-source, just do it, dump out whatever you have into a repo, call it 'beta'


It does matter. It already has a pretty active community and thousands of people who follow the development closely, however, most won't commit until the entire language is fully opened... including me.

Valuable technologies are not so easily dismissed


This failed when I put in an australian postal code.


Shell programming is high density inter-language glue. You simply have more options of implementations to call out to and so less to write.

I can trivially combine a tool written in rust with one written in js/java/C/whatever without writing bindings


First thing a public spacex would want to do is sell off all the non-spacex crap


A public SpaceX will still be run by Musk. A public SpaceX would have to sell assets like X for a huge loss given its debt load, which would also take a propaganda machine out of Musk’s hands.

They’re stuck with those assets.


This sort of thing will be great for the SpaceX IPO :/


Especially if contracts with SpaceX start being torn up because the various ongoing investigations and prosecutions of xAI are now ongoing investigations and prosecutions of SpaceX. And next new lawsuits for creating this conflict of interest by merger.


Running llama.cpp rather than vLLM, it's happy enough to run the FP8 variant with 200k+ context using about 90GB vram


yeah, what did you get for tok/sec there though? Memory bandwidth is the limitation with these devices. With 4 bit I didn't get over 35-39 tok/sec, and averaged more like 30 when doing actual tool use with opencode. I can't imagine fp8 being faster.


You can run large-ish MoE model at good speeds, like gpt-oss-120b, it's snappy enough even with big context.

But large and dense at the same time is a bit slow.

Running a local LLM will be a load of money for something much slower than the api providers though.


Makes sense regarding the MoE performance. I am not sure the cost argument holds up for high volume workloads though. If you are running batch jobs 24/7 the hardware pays for itself in a few months compared to API opex. It really just comes down to utilization.


Do you have specific t/s numbers for those dense models? I'm curious just how severe the memory bandwidth bottleneck gets in practice.

I'm not sure I agree on the cost aspect though. For high-volume production workloads the API bills scale linearly and can get painful fast. If you can amortize the hardware over a year and keep the data local for privacy, the math often works out in favor of self-hosting.


For Qwen2.5-72B-Instruct-Q5_K_M at 32k context, I fed it a 26k token file (truncated fiction novel) asking it to summarize, and it input processed at 224 tok/s and output generated at 3 tok/s. Not really good enough for interactive use without frustration. Not just from watching it reply, but also the long wait for it to actually read the book.

On the same hardware gpt-oss-120b at 128k context, I fed it a longer version of the input (a whole novel, 97k tok), and it input processed at 1650 tok/s and output generated at 27 tok/s. Just fast enough IMO


We may as well have the LLMs use the hardest most provably-correct language possible


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: