I've got some comment somewhere on HN that says exactly that "try CPU inference ...

I've got some comment somewhere on HN that says exactly that "try CPU inference first, it's pretty good".

The need to reach for a T4 comes when someone is doing a big model on images or video and wants sub-second response time. (Think some of the stuff on Snapchat, etc.)