Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Said GPUs spend half the time just waiting for memory.


Yep, but they are still 50x faster than any fpga.


probably not B200 level but better than you might expect:

https://www.positron.ai/

i believe a B200 is ~3x the H200 at llama-3, so that puts the FPGAs at around 60% the speed of B200s?


I wouldn't trust any benchmarks on the vendors site. Microsoft went down this path for years with FPGAs and wrote off the entire effort.


ok? i worked on those devices, those numbers are real. theres a reason why they compare to h200 and not b200

> I have worked with FPGAs that outperform H200s in Llama3-class models a while and a half ago


I'd like to know more. I expect these systems are 8xvh1782. Is that true? What's the theoretical math throughput - my expectation is that it isn't very high per chip. How is performance in the prefill stage when inference is actually math limited?


i was a software guy, sorry, but those token rates are correct and what was flowing through my software.

i believe there was a special deal on super special fpgas. there were dsps involved.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: