We made an adapter (a specific CLI interface) for the LLM to interface with the ...

We made an adapter (a specific CLI interface) for the LLM to interface with the app. Kind of like an integration test, just a little bit more sophisticated.

The LLM gets a prompt with the CLI commands it may use, and its "personality", and then it does what it does.

On the hardware-side, I personally have 2x 3090 cards on an AMD TR 79x platform with 128GB RAM, which yields around 12 token/sec for LLama 3.3 or Qwen 2.5 72B (Q5_k_m), which is okay (ingestion speed is approx double that)

If you want to know more details, feel free to drop me a message (username at liku dot social)