Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Currently the C code does not invoke Python and there is no way to pass a prompt. It does not need any special permissions.


so just to understand... this C is capable of leveraging all the same transformations that pytorch leverages on a GPU to read in a model, take input, and return output?


No. The C code can read in a model weight, take input, and return output, but it runs on CPU, not GPU. It also can't run any other models, unlike PyTorch. The model is hardcoded to Llama 2.


Whoa. So no attempts at using a GPU, and still performs that fast. That's bloody impressive. Kind of scary, actually.

Thanks for explaining.


This is code written in C which does the same calculations as other versions of Llama 2, such as the PyTorch one.

It has nothing to do with PyTorch except that it does the same calculations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: