Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> for no reason other than to minimally rice benchmarks.

For AI/ML applications, perhaps no one will notice.

For gaming, yielding threads of execution to the OS can periodically incur minimum scheduler delays of 10-20ms. Many gamers will notice an ~extra frame of latency being randomly injected.



Sure, but CUDA is an AI/ML API, and anyways you're not doing blocking calls when writing a graphics engine regardless. (Well, you better not.) And anyways, these calls will already busyspin for a few millis before yielding to the OS - it's just that you have to explicitly opt in to the latter part. So these are the sorts of calls that you'd use for high-throughput work, but they behave like calls designed for very-low-latency work. There is no point in shaving a few milliseconds off a low-seconds call other than to make NVidia look a few percent better in benchmarks. The tradeoffs are all wrong, and because nobody knows about it, megawatts of energy are being wasted.


This is important if you are launching many kernels and orchestrating their execution from the CPU.


In that case (which tbh is kind of bad design imo), you should have to explicitly opt in to the power-hungry mode.


This is a thing that people want, hence the decision. Unfortunately those people pay Nvidia a lot more money than you do.


The thing is, it's really not hard to recognize this access pattern. Just bucket API call times and switch modes on the fly.

There is simply no excuse for an app that does 10 API calls a second to burn 100% CPU.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: