We made an adapter (a specific CLI interface) for the LLM to interface with the app. Kind of like an integration test, just a little bit more sophisticated.
The LLM gets a prompt with the CLI commands it may use, and its "personality", and then it does what it does.
On the hardware-side, I personally have 2x 3090 cards on an AMD TR 79x platform with 128GB RAM, which yields around 12 token/sec for LLama 3.3 or Qwen 2.5 72B (Q5_k_m), which is okay (ingestion speed is approx double that)
If you want to know more details, feel free to drop me a message (username at liku dot social)
The LLM gets a prompt with the CLI commands it may use, and its "personality", and then it does what it does.
On the hardware-side, I personally have 2x 3090 cards on an AMD TR 79x platform with 128GB RAM, which yields around 12 token/sec for LLama 3.3 or Qwen 2.5 72B (Q5_k_m), which is okay (ingestion speed is approx double that)
If you want to know more details, feel free to drop me a message (username at liku dot social)