> for no reason other than to minimally rice benchmarks.
For AI/ML applications, perhaps no one will notice.
For gaming, yielding threads of execution to the OS can periodically incur minimum scheduler delays of 10-20ms. Many gamers will notice an ~extra frame of latency being randomly injected.
Sure, but CUDA is an AI/ML API, and anyways you're not doing blocking calls when writing a graphics engine regardless. (Well, you better not.) And anyways, these calls will already busyspin for a few millis before yielding to the OS - it's just that you have to explicitly opt in to the latter part. So these are the sorts of calls that you'd use for high-throughput work, but they behave like calls designed for very-low-latency work. There is no point in shaving a few milliseconds off a low-seconds call other than to make NVidia look a few percent better in benchmarks. The tradeoffs are all wrong, and because nobody knows about it, megawatts of energy are being wasted.
For AI/ML applications, perhaps no one will notice.
For gaming, yielding threads of execution to the OS can periodically incur minimum scheduler delays of 10-20ms. Many gamers will notice an ~extra frame of latency being randomly injected.