Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could be true.

Deepseek obviously trained on OpenAI outputs, which were originally RLHF'd. It may seem that we've got all the human feedback necessary to move forward and now we can infinitely distil + generate new synthetic data from higher parameter models.



> Deepseek obviously trained on OpenAI outputs

I’ve seen this claim but I don’t know how it could work. Is it really possible to train a new foundational model using just the outputs (not even weights) of another model? Is there any research describing that process? Maybe that explains the low (claimed) costs.


Probably not the whole model, but the first step was "fine tuning" the base model on ~800 chain of thought examples.

Those were probably from OpenAI models. Then they used reinforcement learning to expand the reasoning capabilities.


800k. They say they came from earlier versions of their own models, with a lot of bad examples rejected. They don't seem to say which models they got the "thousands of cold-start" examples from earlier in the process though.


every single model does/did this. Initially fine tuning required the expensive hand labeled outputs for RLHF. Generating your training data from that inherently encodes the learned distributions and improves performance, hence why some models would call themselves chatgpt despite not being openai models.


Check the screenshot below re: training on OpenAI Outputs. They've fixed this since btw, but it's pretty obvious they used OpenAI outputs to train. I mean all the Open AI "mini" models are trained the same way. Hot take but feels like the AI labs are gonna gatekeep more models and outputs going forward.

https://x.com/ansonhw/status/1883510262608859181




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: