Hacker Newsnew | past | comments | ask | show | jobs | submit | nehalem's commentslogin

The actual disturbing thing is that given Next‘s track record of questionable security architecture, the author felt compelled to make the joke explicit.


There is an element of tragic comedy to those announcement. While remarkable on their own, everybody knows that one cannot use any new browser feature reliably any time soon due to Apple not shipping continuous updates to the browsers they force upon their users.


iOS from 2 versions prior don't get latest Safari?

I can't check because my wife's iPhone is, regrettably according to her, "updated to the latest glAss version".


I know one of my clients complained something didnt work on their few year old iPad. So.. I don't know what the cutoff is but clearly not everything updates regularly. He did try updating it manually too but couldn't.


Safari got a big update last week.


Safari in general got an update, or Safari on only the devices Apple deems worthy? Usually Apple limits Safari updates to new phones.


Do you consider six-year-old phones new? What about seven-year-old Macs?


I think the iPhone X is the newest model that is no longer receiving iOS updates. That came out in 2017. So 8 years ago


To me personally, it feels like Windows 2000 was the last and maybe only consistent UI onto which all later versions bolted what they considered improvements without ever overhauling the UI in full.


I think Windows XP did a pretty good job for the home market, making Windows appear friendly and easy to use to a wide audience (and without too many style inconstistencies).

Moreover, Windows XP let you switch the interface back to the classic 9x look, if you wanted a more serious appearance, and better performance.


> back to the classic 9x look

If i remember correctly this is the windows 2000 look.


We're both right. Windows XP had two different legacy themes: "Windows Standard" which looked like Windows 2000 and "Windows Classic" which looked like Windows 9x.


Totally agree!

Although I‘m a Mac user for a long time, I still remember that I got work done using Windows 2000.

I‘d buy a license and switch back to Windows if we could get the productivity of this UI.

Typing this on iOS with Liquid Glass that drives me nuts


Windows 8 was a pretty big overhaul. But I agree with the author it was a most unwelcome overhaul.


Yeah, but many of its 'advanced' settings and such still pop-up windows 95-styled interfaces. And these are actually the most user-friendly parts of the OS.


I think one of the fundamental issues is "...to those raised on computers, rather than smartphones"


I am glad Vercel works on agents now. After all, Next is absolutely perfect and recommends them for greater challenges. /s


From AWS wrapper to OpenAI wrapper


How does it do with multi-column text and headers and footers?


We have trained the model on tables with hierarchical column headers and with rowspan and colspan >1. So it should work fine. This is the reason we predict the table in HTML instead of markdown.


Thank you. I was rather thinking of magazine like layouts with columns of text and headers and footers on every page holding article title and page number.


It should work there also. We have trained on research papers with two columns of text. Generally, papers have references as a footer and contains page number.


I wonder what happened to Siri. Not a single mention anywhere?


hope to show you more later this year. was like the first thing they said about apple intelligence


Which is the same as what they said last year.


I actually loved Siri when it first came out. It felt magical back then (in a way)


I wonder how this relates to Mother Duck (https://motherduck.com/)? They do „DuckDB-powered data warehousing“ but predate this substantially.


Motherduck is hosting duckdb in cloud. DuckLake is a much more open system.

Ducklake you can build petabyte scale warehouse with multiple readers and writer instances, all transactional on your s3, on your ec2 instances.

Motherduck has limitations like only one writer instance. Read replicas can be 1m behind (not transactional).

Having different instances concurrently writing to different tables is not possible.

Ducklake gives proper separation of compute and storage with a transactional metadata layer.


Just wondering does DuckLake utilizing Open Table Formats (OTFs) since I don't see it's mentioned anywhere in the website?


No. DuckLake is implementing the Open Table Format (and the Catalog above the Table Format). Not utilizing them, but an alternate implementation.


For what it's worth, MotherDuck and DuckLake will play together very nicely. You will be able to have your MotherDuck data stored in DuckLake, improving scalability, concurrency, and consistency while also giving access to the underlying data to third-party tools. We've been working on this for the last couple of months, and will share more soon.


i think a way to see it is MotherDuck is a service to just throw your data at at they will sort it (using duckdb underneath) and you can use DuckDB to iterface with your data. But if you want to be more "lakehouse" or maybe down the line there are more integrations with DuckLake ir you want data to be stored in a blob storage, you can use DuckLake with MotherDuck as the metadata store.


Not knowing much about special-purpose chips, I would like to understand whether chips like this would give Google a significant cost advantage over the likes of Anthropic or OpenAI when offering LLM services. Is similar technology available to Google's competitors?


GPUs, very good for pretraining. Inefficient for inference.

Why?

For each new word a transformer generates it has to move the entire set of model weights from memory to compute units. For a 70 billion parameter model with 16-bit weights that requires moving approximately 140 gigabytes of data to generate just a single word.

GPUs have off-chip memory. That means a GPU has to push data across a chip - memory bridge for every single word it creates. This architectural choice, is an advantage for graphics processing where large amounts of data needs to be stored but not necessarily accessed as rapidly for every single computation. It's a liability in inference where quick and frequent data access is critical.

Listening to Andrew Feldman of Cerebras [0] is what helped me grok the differences. Caveat, he is a founder/CEO of a company that sells hardware for AI inference, so the guy is talking his book.

[0] https://www.youtube.com/watch?v=MW9vwF7TUI8&list=PLnJFlI3aIN...


Cerebras (and Groq) has the problem of using too much die for compute and not enough for memory. Their method of scaling is to fan out the compute across more physical space. This takes more dc space, power and cooling, which is a huge issue. Funny enough, when I talked to Cerebras at SC24, they told me their largest customers are for training, not inference. They just market it as an inference product, which is even more confusing to me.

I wish I could say more about what AMD is doing in this space, but keep an eye on their MI4xx line.


Thank you for sharing this perspective — really insightful. I’ve been reading up on Groq’s architecture and was under the impression that their chips dedicate a significant portion of die area to on-chip SRAM (around 220MiB per chip, if I recall correctly), which struck me as quite generous compared to typical accelerators.

From die shots and materials I’ve seen, it even looks like ~40% of the die might be allocated to memory [1]. Given that, I’m curious about your point on “not enough die for memory” — is it a matter of absolute capacity still being insufficient for current model sizes, or more about the area-bandwidth tradeoff being unbalanced for inference workloads? Or perhaps something else entirely?

I’d love to understand this design tension more deeply, especially from someone with a high-level view of real-world deployments. Thanks again.

[1] Think Fast: A Tensor Streaming Processor (TSP) for Accelerating Deep Learning Workloads — Fig. 5. Die photo of 14nm ASIC implementation of the Groq TSP. https://groq.com/wp-content/uploads/2024/02/2020-Isca.pdf


> is it a matter of absolute capacity still being insufficient for current model sizes

This. Additionally, models aren't getting smaller, they are getting bigger and to be useful to a wider range of users, they also need more context to go off of, which is even more memory.

Previously: https://news.ycombinator.com/item?id=42003823

It could be partially the DC, but look at the rack density... to get to an equal amount of GPU compute and memory, you need 10x the rack space...

https://www.linkedin.com/posts/andrewdfeldman_a-few-weeks-ag...

Previously: https://news.ycombinator.com/item?id=39966620

Now compare that to an NV72 and the direction Dell/CoreWeave/Switch are going in with the EVO containment... far better. One can imagine that AMD might do something similar.

https://www.coreweave.com/blog/coreweave-pushes-boundaries-w...


Thanks for the links — I went through all of them (took me a while). The point about rack density differences between SRAM-based systems like Cerebras or Groq and GPU clusters is now clear to me.

What I’m still trying to understand is the economics.

From this benchmark: https://artificialanalysis.ai/models/llama-4-scout/providers...

Groq seems to offer near lowest prices per million tokens and the near fastest end to end response times. That’s surprising because in my understanding, speed(latency) and the cost are trade-offs.

So I’m wondering: Why can’t GPU-based providers can't offer cheaper but slower(high-latency) APIs? Or do you think Groq/Cerebras are pricing much below cost (loss-leader style)?


Loss leader. It is uber/airbnb. Book revenue, regardless of economics, and then debt finance against that. Hope one day to lock in customers, or raise prices, or sell the company.


> they told me their largest customers are for training, not inference

That is curious. Things are moving so quickly right now. I typed out a few speculative sentences then went ahead and asked an LLM.

Looks like Cerebras is responding to the market and pivoting towards a perceived strength of their product combined with the growth in inference, especially with the advent of reasoning models.


I wouldn't call it "pivoting" as much as "marketing".


Several incorrect assumptions in this take. For one thing, 16 bit is not necessary. For another 140GB/token holds only if your batch size is 1 and your sequence length is 1 (no speculative decoding). Nobody runs LLMs like that on those GPUs - if you do it like that, compute utilization becomes ridiculously low. With batch of greater than 1 and speculative decoding arithmetic intensity of the kernels is much higher, and having weights "off chip" is not that much of a concern.


The Groq interview was good too. Seems that the thought process is that companies like Groq/Cerebras can run the inference, and companies like Nvidia can keep/focus on their highly lucrative pretraining business.

https://www.youtube.com/watch?v=xBMRL_7msjY


Anthropic is using Google TPUs. Also jointly working with Amazon on a data center using Amazon's custom AI chips. Also Google and Amazon are both investors in Anthropic.

https://www.datacenterknowledge.com/data-center-chips/ai-sta...

https://www.semafor.com/article/12/03/2024/amazon-announces-...


NVIDIA operates at 70% profit right now. Not paying that premium and having alternative to NVIDIA is beneficial. We just don't know how much.


I might be misremembering here, but Google's own AI models (Gemini) don't use NVIDIA hardware in any way, training or inference. Google bought a large number of NVIDIA hardware only for Google Cloud customers, not themselves.


Google has a significant advantage over other hyperscalers because Google's AI data centers are much more compute cost efficient (capex and opex).


Because of the TPUs, or due to other factors?

What even is an AI data center? are the GPU/TPU boxes in a different building than the others?


Lots of other factors. I suspect this is one of the reasons why Google cannot offer TPU hardware itself out of their cloud service. A significant chunk of TPU efficiency can be attributed external factors which customers cannot easily replicate.


> Because of the TPUs, or due to other factors?

Google does many pieces of the data center better. Google TPUs use 3D torus networking and are liquid cooled.

> What even is an AI data center?

Being newer, AI installations have more variations/innovation than traditional data centers. Google's competitors have not yet adopted all of Google's advances.

> are the GPU/TPU boxes in a different building than the others?

Not that I've read. They are definitely bringing on new data centers, but I don't know if they are initially designed for pure-AI workloads.


Wouldn't a 3d torus network have horrible performance with 9,216 nodes? And really horrible latency? I'd have assumed traditional spine-leaf would do better. But I must be wrong as they're claiming their latency is great here. Of course, they provide zero actual evidence of that.

And I'll echo, what even is an AI data center, because we're still none the wiser.


A 3d torus is a tradeoff in terms of wiring complexity/cost and performance. When node counts get high you can't really have a pair of wires between all pairs of nodes, so if you don't use a torus you usually need a stack of switches/routers aggregating traffic. Those mid-level and top-level switch/routers get very expensive (high bandwidth cross-section) and the routing can get a bit painful. 3d torus has far fewer cables, and the routing can be really simple ("hop vertically until you reach your row, then hop horizontally to read your node"), and the wrap-around connections are nice.

That said, the torus approach was a gamble that most workloads would be nearest-neighbor, and allreduce needs extra work to optimize.

An AI data center tends to have enormous power consumption and cooling capabilities, with less disk, and slightly different networking setups. But really it just means "this part of the warehouse has more ML chips than disks"


> most workloads would be nearest-neighbor

Thank you very much, that is the piece of the puzzle I was missing. Naively, it still seems (to me) far more hops for a 3d torus than a regular multi-level switch when you've got many thousands of nodes, but I can appreciate it could be much simpler routing. Although, I would guess in practice it requires something beyond the simplest routing solution to avoid congestion.


Was that gamble wrong? I thought all LLM training workloads do collectives that involve all nodes (all-gather, reduce-scatter).


I think the choice they made, combined with some great software and hardware engineering, allows them to continue to innovate at the highest level of ML research regardless of their specific choice within a reasonable dollar and complexity budget.


It’s data center with much higher power density. We’re talking about 100 going to 1,000 kw/rack vs 20 kw/rack for a traditional data center. Requiring much different cooling a power delivery.


> what even is an AI data center

A data center that runs significant AI training or inference loads. Non AI data centers are fairly commodity. Google's non-AI efficiency is not much better than Amazon or anyone else. Google is much more efficient at running AI workloads than anyone else.


> Google's non-AI efficiency is not much better than Amazon or anyone else.

I don't think this is true. Google has long been a leader in efficiency. Look at the power usage effectiveness (PUE). A decade ago Google announced average PUEs around 1.12 while the industry average was closer to 2.0. From what I can tell they reported a 1.1 average fleet wide last year. They've been more transparent about this than any of the other big players.

AWS is opaque by comparison, but they report 1.2 on average. So they're close now, but that's after a decade of trying to catch up to Google.

To suggest the rest of the industry is on the same level is not at all accurate.

https://en.wikipedia.org/wiki/Power_usage_effectiveness

(Amazon isn't even listed in the "Notably efficient companies" section on the Wikipedia page).


A decade ago seems like a very long time.

We've seen the rise of OSS Kubernetes and eBPF networking since, and a lot more that I don't have on-stack rn.

I wouldn't be surprised if everyone else had significantly closed the hardware utilization gap.


Nvidia has ~60% margins in their datacenter chips. So TPU's have quite a bit of headroom to save google money without being as good as Nvidia GPU's.

No one else has access to anything similar, Amazon is just starting to scale their Trainium chip.


Microsoft has the MAIA 100 as well. No comment on their scale/plans though.


There are other ai/llm ‘specific’ chips out there, yes. But the thing about asics is that you need one for each *specific* task. Eventually we’ll hit an equilibrium but for now, the stuff that Cerebras is best at is not what TPUs are best at is not what GPUs are best at…


I don't even know if eventually we'll hit an equilibrium.

The end of Moore's law pretty much dictates specialization, it's just more apparent in fields without as much ossification first.


This looks great!

Is anyone familiar with something similar for Python in general and Django in particular?


also interested!


I wonder whether the search algorithm would need (and can?) to be adjusted to respond to the increased probability of playing numbers that are hard to find with standard binary search.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: