Hacker Newsnew | past | comments | ask | show | jobs | submit | convexstrictly's commentslogin


"Just when you thought it was over... we’re introducing Gemini 2.0 Flash Thinking, a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts.

The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more ..."

- Logan Kilpatrick

https://x.com/OfficialLoganK/status/1869789822384255300


GB per second


Simran Arora: "Join us for a livestream this Thursday, Halloween/Diwali, and join our channel on the GPU Mode Discord server to hang out with us/get involved:"

https://discord.com/login?redirect_to=%2Fchannels%2F11894982...


Livestream link: https://youtube.com/live/IAwLzkldxUk?feature=share! Come ask questions!


Thanks!


CUDA + ThunderKittens 4.5 hour tutorial

https://www.youtube.com/watch?v=xcpEl0cGCC4



Aider uses Treesitter to improve code generation. https://aider.chat/2023/10/22/repomap.html

Aider: https://github.com/paul-gauthier/aider

It is state of the art on SWE-Bench and SWE-Bench Lite. https://aider.chat/2024/06/02/main-swe-bench.html


Candle is a minimalist ML framework for Rust with a focus on performance (including GPU support) and ease of use

https://github.com/huggingface/candle


Love Candle! I actually ported Karpathy's previous GPT tutorial to candle, including training [0]

[0] https://www.perceptivebits.com/building-gpt-from-scratch-in-...


You can also target WASM. Depending on the model size, it can be really good. Here are some examples of quantized models.

Vision Model https://huggingface.co/spaces/radames/Candle-Moondream-2 Blip Image Captioning https://huggingface.co/spaces/radames/Candle-BLIP-Image-Capt... Microsoft Phi 2 https://huggingface.co/spaces/radames/Candle-phi1-phi2-wasm-...


Not barely as minimal as Karpathy his implementation


I wouldn't call in "minimalist" after seeing Karpathy's code.


Candle focuses on inference though.


Candle dev here, we also support training/backdrop! We certainly focus on optimizing inference performance but hopefully that should improve the training efficiency too.


What is referecing ?


Inference means using the neural net, as opposed to training it.

During inference you feed an input into the NN and it passes through it in "forwards" direction (i.e. from input to output), being modified according to the "weights" that were learnt during training, to derive the output.

During training, each training sample is first fed forwards through the NN, the same way as for inference, but then the output of the model (which at the beginning of training will be random/wrong) is compared to the correct/desired output for that training sample, and a corresponding error value will then be fed backwards (from output to input) through the NN according to the "backpropagation" mechanism to update the weights.

Training is a lot more involved than inference since it involves this backpropagation step.


An author claims better performance than LoRA in 50% of the time.

https://twitter.com/Rui45898440/status/1772996453557997924


The federal government requests comments on regulation of AI models with openly available weights. The deadline is March 27, 2024.

Earlier thread. https://news.ycombinator.com/item?id=39494760


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: