Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is less about conversion and more about extending ANE support for transformer-style models or giving developers more control.

The issue is in targeting specific hardware blocks. When you convert with coremltools, Core ML takes over and doesn't provide fine-grained control - run on GPU, CPU or ANE. Also, ANE isn't really designed with transformers in mind, so most LLM inference defaults to GPU.



Neural Engine is optimized for power efficiency, not performance.

Look for Apple to add matmul acceleration into the GPU instead. Thats how to truly speed up local LLMs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: