800*k*. They say they came from earlier versions of their own models, with a lot...

		mkl 11 months ago \| parent \| context \| favorite \| on: DeepSeek-R1: Incentivizing Reasoning Capability in... 800k. They say they came from earlier versions of their own models, with a lot of bad examples rejected. They don't seem to say which models they got the "thousands of cold-start" examples from earlier in the process though.