Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: What are local LLMs good for?
6 points by gjm11 on Jan 3, 2025 | hide | past | favorite | 3 comments
There are various blog posts and YouTube videos out there whose basic content is: "Look, I installed an LLM on my shiny MacBook and was able to make it run". (Usually a MacBook, I think because 1. anything else makes it much more expensive to get a decent amount of GPU-accessible RAM and 2. I cynically suspect that part of the subtext of these is "look what a fancy laptop I can afford".)

What tends to be in shorter supply is much information about what these models are good for other than blogposts about "how I got a local LLM running on my machine".

I am in the market for a new laptop; it might be a Mac; I am curious what benefit I might get if I pay for more RAM and/or GPU cores in order to be able to run bigger models.

So, O glorious internet hive-mind:

If you have tried to run a local LLM on your own machine, what did you run on what hardware and what actually-useful things was it able or unable to do?

I am extra-interested in anything that would help me predict what extra value I might get from having a given amount more memory available to the GPU.

I am extra-interested in highly specific answers that let me know (1) some particular thing you were or weren't able to get the model to do satisfactorily, (2) whether it's likely that I'd agree with you about how satisfactory the outcome was, and (3) what hardware I'd need to be able to do similar things if I wanted to.

I'm interested in a wide variety of possible applications. (If you did some "running a large model on my own hardware" thing that isn't strictly an LLM, that could be interesting too. Image generation, for instance.)

If you did anything in the way of training or fine-tuning as well as just inference, that's interesting too.

I am not interested in general-principles answers along the lines of "AI is the future; if you aren't running LLMs on your laptop you're behind the times" or "LLMs are stochastic models and by definition can never do anything useful". (If you think one of those things, you might be right, but I get zero new information from the fact that someone thinks so; I already know that some people do.)

Thanks!



I'm using local LLMs for battle-testing an app by impersonating various personae with bad intentions (mostly on the social side, but also ones trying to break the system). Off-the-shelf cloud LLMs refuse such things (probably rightfully so); Llama 3.3 in q8, and qwen 2.5 72b even more so, are surprisingly good for this.

Also, even not-so-capable, smallish LLMs are able to be really good testers even outside the bad persona domain (given a good CLI interface). As long as your energy cost is okay-ish and you already have the hardware, that's quite a good use.


How does that work in practice? Do you connect the LLM up directly to the app somehow, or do you ask it what to do and then do what it says and see what happens?

How much RAM do you need to make those work acceptably well? I assume Llama 3.3 means the 70B model, so you need > 70GB. (So, I guess, a MacBook with 128GB?) In which case I guess you're also using 8 bits for the Qwen model?


We made an adapter (a specific CLI interface) for the LLM to interface with the app. Kind of like an integration test, just a little bit more sophisticated.

The LLM gets a prompt with the CLI commands it may use, and its "personality", and then it does what it does.

On the hardware-side, I personally have 2x 3090 cards on an AMD TR 79x platform with 128GB RAM, which yields around 12 token/sec for LLama 3.3 or Qwen 2.5 72B (Q5_k_m), which is okay (ingestion speed is approx double that)

If you want to know more details, feel free to drop me a message (username at liku dot social)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: