It is less about conversion and more about extending ANE support for transformer... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		coffeecoders 5 months ago \| parent \| context \| favorite \| on: Experimenting with Local LLMs on macOS It is less about conversion and more about extending ANE support for transformer-style models or giving developers more control. The issue is in targeting specific hardware blocks. When you convert with coremltools, Core ML takes over and doesn't provide fine-grained control - run on GPU, CPU or ANE. Also, ANE isn't really designed with transformers in mind, so most LLM inference defaults to GPU.

aurareturn 5 months ago [–]

Neural Engine is optimized for power efficiency, not performance.

Look for Apple to add matmul acceleration into the GPU instead. Thats how to truly speed up local LLMs.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact