Hacker Newsnew | past | comments | ask | show | jobs | submit | woodson's commentslogin

HF is notorious for making it difficult to work offline (or at least not waste time trying to connect when everything needed is offline) and is constantly changing how it is being handled. Previously, there was TRANSFORMERS_OFFLINE, HF_DATASETS_OFFLINE, etc.

> But I don't see a scenario where pytorch needs network access.

Training models across multiple compute nodes? That’s a big one.


Not if one is running it in a non-privileged vm/container with restricted network access. But everything is YOLO these days.

Forgive the tangent, but I'm just starting to learn about using AI for coding, and getting a safe sandbox is one of my next steps.

Any suggestions for a vm/container setup that works on a Linux host, provides the safety net you describe, and is still capable enough to try out all these things that people are talking about?


You can use devcontainers (in VSCode or separate), like this: https://github.com/entn-at/claude-rust-devcontainer/

This will limit the agent in what it can do in the system and what IPs/domains it can reach. This requires a lot of customization to your specific framework/environment. Note that this can reduce the agent’s effectiveness, as it will have to “work around” some of the limitations. This isn’t foolproof either, and the agent could exfiltrate data e.g. via DNS requests.


Easiest thing is to run your AI under a separate user identity, with its own home directory, and no sudo permission. Then it can't screw up your system or your own files.

Why generative? Or has it been decided that only generative models are “AI”?

What kind of model "reproduce"s things later for profit that is not generative?

Classification? Image recognition? Surveillance? There’s plenty of non-generative use cases, not everything needs to be generative.

Surveillance models.

It highly depends on the sort of data you’re processing (phone calls, podcasts, meetings of more people recorded using single channel?). For NVIDIA/NeMo, check out their softformer diarization models (also streaming).

Well, but their customers are those that buy Apple hardware.


Look into RWKV.


Yeah RWKV is definitely related in spirit (recurrent state for long context). Here I’m combining local windowed attention with a gated recurrent path + KV cache compression, so it’s more hybrid than fully replacing attention


It’s a distinction that IMHO likely doesn’t make much difference, at least for the mostly automated/non-interactive coding agent use case. What matters more is how well the post-training on synthetic harness traces works.


There are still opportunities, but they aren’t paid nearly as well as less researchy positions in industry. US post-doc salaries at state universities aren’t that high.


You mean Moshi (https://github.com/kyutai-labs/moshi)? Since Personaplex is just a finetuned Moshi model.


Yeah except moshi doesn’t sound good at all


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: