Created about 1 month ago
Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learnin...
Added ago
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h! - jingyaogong/mi...
A new tool that blends your everyday work apps into one.
Reproduce R1 Zero on Logic Puzzle. Contribute to Unakar/Logic-RL development by creating an account ...
s1: Simple test-time scaling. Contribute to simplescaling/s1 development by creating an account on G...
A Blog post by Open R1 on Hugging Face
The Reinforcement Learning from Human Feedback Book
Reproduce Deepseek R1 „aha moment“ and train an open model using reinforcement learning trying to te...
Login to subscribe this collection.