I've got some comment somewhere on HN that says exactly that "try CPU inference first, it's pretty good".
The need to reach for a T4 comes when someone is doing a big model on images or video and wants sub-second response time. (Think some of the stuff on Snapchat, etc.)
The need to reach for a T4 comes when someone is doing a big model on images or video and wants sub-second response time. (Think some of the stuff on Snapchat, etc.)