Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

so, these are hand optimized primitives for specific model of nvidia gpus? do you still have to make launch/scheduling decisions to maximize occupancy? how does this approach scale to other target devices with specialized instruction sets and different architecture?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: