HF is notorious for making it difficult to work offline (or at least not waste time trying to connect when everything needed is offline) and is constantly changing how it is being handled. Previously, there was TRANSFORMERS_OFFLINE, HF_DATASETS_OFFLINE, etc.
Forgive the tangent, but I'm just starting to learn about using AI for coding, and getting a safe sandbox is one of my next steps.
Any suggestions for a vm/container setup that works on a Linux host, provides the safety net you describe, and is still capable enough to try out all these things that people are talking about?
This will limit the agent in what it can do in the system and what IPs/domains it can reach. This requires a lot of customization to your specific framework/environment. Note that this can reduce the agent’s effectiveness, as it will have to “work around” some of the limitations. This isn’t foolproof either, and the agent could exfiltrate data e.g. via DNS requests.
Easiest thing is to run your AI under a separate user identity, with its own home directory, and no sudo permission. Then it can't screw up your system or your own files.
It highly depends on the sort of data you’re processing (phone calls, podcasts, meetings of more people recorded using single channel?). For NVIDIA/NeMo, check out their softformer diarization models (also streaming).
Yeah RWKV is definitely related in spirit (recurrent state for long context). Here I’m combining local windowed attention with a gated recurrent path + KV cache compression, so it’s more hybrid than fully replacing attention
It’s a distinction that IMHO likely doesn’t make much difference, at least for the mostly automated/non-interactive coding agent use case. What matters more is how well the post-training on synthetic harness traces works.
There are still opportunities, but they aren’t paid nearly as well as less researchy positions in industry. US post-doc salaries at state universities aren’t that high.
reply