so, these are hand optimized primitives for specific model of nvidia gpus? do yo...

		boywitharupee on Oct 31, 2024 \| parent \| context \| favorite \| on: ThunderKittens: Simple, fast, and adorable AI kern... so, these are hand optimized primitives for specific model of nvidia gpus? do you still have to make launch/scheduling decisions to maximize occupancy? how does this approach scale to other target devices with specialized instruction sets and different architecture?