Hacker Newsnew | past | comments | ask | show | jobs | submit | fulafel's commentslogin

A single threaded benchmark better represents real performance, I'd argue. 10 Gbps is only 1.2 GB/s after all and few applications use parallel streams.

I think the intention is to measure the adapter itself independent from the CPU/overall system.

Besides, I can’t think of a typical single threaded application that would use those data rates, can you?


Steam downloads

Steam download rates are throttled based on how fast it can actually install the game so it’s a bit of an outlier

Interesting conceptual parallel to vibecoding, where also you get something that works, and it's another job to understand how or why.

You can't do it with zero kernel code for ISA devices, but if there was a pci busmouse, uio + uio_pci_generic would work for reading the mouse, and you'd use uinput to send the events to the input stack. If you're willing to make a little uio stub driver for the interrupt, you can do it for ISA. UIO is from 2006 or something.

tldr; it's there but nobody is interested in reinventing ancient pre-PCI drivers, so there's no generic ISA plumbing.


For anyone wondering about the rest of the X standards, they're at: https://www.itu.int/itu-t/recommendations/index.aspx?ser=X

For example from 2023: X.1095: Entity authentication service for pet animals using telebiometrics


"TPU 8t and TPU 8i deliver up to two times better performance-per-watt over the previous generation" sounds impressive especially as the previous generation is so recent (2025).

Interesting that there's separate inference and training focused hardware. Do companies using NV hardware also use different hardware for each task or is their compute more fungible?


That training is compute-bound and inference is memory-bound is well-known, but I don't think Nvidia deployments typically specialize for one vs the other.

One reason is that most clouds/neoclouds don't own workloads, and want fungibility. Given that you're spending a lot on H200s and what not it's good to also spend on the networking to make sure you can sell them to all kinds of customers. The Grok LPU in Vera Rubin is an inference-specific accelerator, and Cerebras is also inference-optimized so specialization is starting to happen.


I can't answer for NVIDIA but AWS has its own training and inference chips, and word on the street is the inference chips are too weak, so some companies are running inference on the training chips.

They stopped producing Inferentia altogether and are only investing in Trainium now. They also announced a partnership with Cerebras not long ago. That should give you a clue.

https://www.cerebras.ai/press-release/awscollaboration


> Interesting that there's separate inference and training focused hardware. Do companies using NV hardware also use different hardware for each task or is their compute more fungible?

Dedicated hardware will usually be faster, which is why as certain things mature, they go from being complicated and expensive to being cheap and plentiful in $1 chips. This tells me Google has a much better grasp on their stack than people building on NVidia, because Google owns everything from the keyboard to the silicon. They've iterated so much they understand how to separate out different functions that compete with each other for resources.


The "training" chips will probably be quite usable for slower, higher-throughput inference at scale. I expect that to be quite popular eventually for non-time-sensitive uses.

Vera Rubin will have Groq chips focused on fast inference so it points toward a trend. Also, with energy needs so high, why not reach for every feasible optimization?

Nvidia said in March that they're working on specialized inference hardware, but they don't have any right now. You can do inference from Nvidia's current hardware offerings, but it's not as efficient.

AMD has been doing inference chips for many years and are the leader for HPC.

https://www.amd.com/en/products/accelerators/instinct.html


Could they safer and/or higher dose acetaminophen pills if they included NAC?

In the comparison table, kuri-fetch is described as "no Chrome needed".

Yes, this is why I wrote this comment.

I did some (really shallow) research and there is lightpanda that seams a bit better solution to search the web from some agent than some wrapper around the Chrome Developer Tools.

https://lightpanda.io/docs/open-source/usage


There is also my engine https://github.com/DioxusLabs/blitz (which compared to lightpanda: has layout and rendering but does not have JS execution).

Nice this is great! Maybe we finally will have new browser in this 'AI age' :)`

For others unfamiliar with Windows, according to https://learn.microsoft.com/en-us/windows-server/administrat... "High Performance" entails:

> Processors are always locked at the highest performance state (including "turbo" frequencies). All cores are unparked. Thermal output may be significant.


This isn’t specific to windows. This is also basically the same terminology Linux uses.

There's no mode spelled the same ("High Performance") - and I don't think Linuxes universally do this:

> Processors are always locked at the highest performance state (including "turbo" frequencies).

Unless performance state means something idiosyncratic in MS terminology.

Normally you'd want to let idle apply power saving measures including downclocking to donate some unused power envelope to busy cores, increasing overall performance.

But this varies across various Linux based platforms. For example on RHEL (https://docs.redhat.com/en/documentation/red_hat_enterprise_...):

"throughput-performance:

    A server profile optimized for high throughput that disables power savings mechanisms. It also enables sysctl settings to improve the throughput performance of the disk and network IO. 
accelerator-performance:

    A profile that contains the same tuning as the throughput-performance profile. Additionally, it locks the CPU to low C states so that the latency is less than 100us. This improves the performance of certain accelerators, such as GPUs. 
latency-performance:

    A server profile optimized for low latency and disables power savings mechanisms and enables sysctl settings that improve latency. CPU governor is set to performance and the CPU is locked to the low C states (by PM QoS). "
Here the latency-performance profile sounds most like the Windows Server mode (but differnet from throughput-performance).

Macs or other competing systems don't have on-die memory.

(Except for the caches, which everybody has)


Battery capacity of smartphones seems to double every ~8 years. The design space is adding more battery capacity, reducing battery life, or using less power.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: