Ask HN: What are local LLMs good for?

firebaze · on Jan 3, 2025

I'm using local LLMs for battle-testing an app by impersonating various personae with bad intentions (mostly on the social side, but also ones trying to break the system). Off-the-shelf cloud LLMs refuse such things (probably rightfully so); Llama 3.3 in q8, and qwen 2.5 72b even more so, are surprisingly good for this.

Also, even not-so-capable, smallish LLMs are able to be really good testers even outside the bad persona domain (given a good CLI interface). As long as your energy cost is okay-ish and you already have the hardware, that's quite a good use.

gjm11 · on Jan 4, 2025

How does that work in practice? Do you connect the LLM up directly to the app somehow, or do you ask it what to do and then do what it says and see what happens?

How much RAM do you need to make those work acceptably well? I assume Llama 3.3 means the 70B model, so you need > 70GB. (So, I guess, a MacBook with 128GB?) In which case I guess you're also using 8 bits for the Qwen model?

firebaze · on Jan 4, 2025

We made an adapter (a specific CLI interface) for the LLM to interface with the app. Kind of like an integration test, just a little bit more sophisticated.

The LLM gets a prompt with the CLI commands it may use, and its "personality", and then it does what it does.

On the hardware-side, I personally have 2x 3090 cards on an AMD TR 79x platform with 128GB RAM, which yields around 12 token/sec for LLama 3.3 or Qwen 2.5 72B (Q5_k_m), which is okay (ingestion speed is approx double that)

If you want to know more details, feel free to drop me a message (username at liku dot social)