More

woodson · 2026-05-02T20:51:41 1777755101

HF is notorious for making it difficult to work offline (or at least not waste time trying to connect when everything needed is offline) and is constantly changing how it is being handled. Previously, there was TRANSFORMERS_OFFLINE, HF_DATASETS_OFFLINE, etc.

woodson · 2026-05-01T14:59:13 1777647553

> But I don't see a scenario where pytorch needs network access.

Training models across multiple compute nodes? That’s a big one.

woodson · 2026-05-01T02:24:32 1777602272

Not if one is running it in a non-privileged vm/container with restricted network access. But everything is YOLO these days.

CoastalCoder · 2026-05-01T10:24:16 1777631056

Forgive the tangent, but I'm just starting to learn about using AI for coding, and getting a safe sandbox is one of my next steps.

Any suggestions for a vm/container setup that works on a Linux host, provides the safety net you describe, and is still capable enough to try out all these things that people are talking about?

woodson · 2026-05-01T20:17:38 1777666658

You can use devcontainers (in VSCode or separate), like this: https://github.com/entn-at/claude-rust-devcontainer/

This will limit the agent in what it can do in the system and what IPs/domains it can reach. This requires a lot of customization to your specific framework/environment. Note that this can reduce the agent’s effectiveness, as it will have to “work around” some of the limitations. This isn’t foolproof either, and the agent could exfiltrate data e.g. via DNS requests.

SoftTalker · 2026-05-01T18:28:23 1777660103

Easiest thing is to run your AI under a separate user identity, with its own home directory, and no sudo permission. Then it can't screw up your system or your own files.

woodson · 2026-04-30T15:12:29 1777561949

Why generative? Or has it been decided that only generative models are “AI”?

throwpoaster · 2026-04-30T15:21:05 1777562465

What kind of model "reproduce"s things later for profit that is not generative?

stingraycharles · 2026-04-30T20:58:17 1777582697

Classification? Image recognition? Surveillance? There’s plenty of non-generative use cases, not everything needs to be generative.

Ritewut · 2026-04-30T15:31:58 1777563118

Surveillance models.

woodson · 2026-04-28T18:48:13 1777402093

It highly depends on the sort of data you’re processing (phone calls, podcasts, meetings of more people recorded using single channel?). For NVIDIA/NeMo, check out their softformer diarization models (also streaming).

woodson · 2026-04-11T22:55:33 1775948133

Well, but their customers are those that buy Apple hardware.

woodson · 2026-04-07T14:51:13 1775573473

Look into RWKV.

JohannaAlmeida · 2026-04-07T15:04:21 1775574261

Yeah RWKV is definitely related in spirit (recurrent state for long context). Here I’m combining local windowed attention with a gated recurrent path + KV cache compression, so it’s more hybrid than fully replacing attention

woodson · 2026-04-01T15:29:31 1775057371

It’s a distinction that IMHO likely doesn’t make much difference, at least for the mostly automated/non-interactive coding agent use case. What matters more is how well the post-training on synthetic harness traces works.

woodson · 2026-03-23T15:45:26 1774280726

There are still opportunities, but they aren’t paid nearly as well as less researchy positions in industry. US post-doc salaries at state universities aren’t that high.

woodson · 2026-03-02T23:17:27 1772493447

You mean Moshi (https://github.com/kyutai-labs/moshi)? Since Personaplex is just a finetuned Moshi model.

mountainriver · 2026-03-03T00:04:30 1772496270

Yeah except moshi doesn’t sound good at all