so just to understand... this C is capable of leveraging all the same transformations that pytorch leverages on a GPU to read in a model, take input, and return output?
No. The C code can read in a model weight, take input, and return output, but it runs on CPU, not GPU. It also can't run any other models, unlike PyTorch. The model is hardcoded to Llama 2.