Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes. It is not incredibly rare, it's incredibly common. A huge percentage of queries to retail LLMs are things like "hello" and "what can you do", with static system prompts that make the total context identical.

It's worth maybe a 3% reduction in GPU usage. So call it a half billion dollars a year or so, for a medium to large service.



    > It's worth maybe a 3% reduction in GPU usage. So call it a half billion dollars a year or so, for a medium to large service.
So if 3% is 500M, then annual spend is ~16.6B. That is medium sized these days?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: