DeepSeek R1 reproduce

Ke Fang

Created about 1 year ago

Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learnin...

Added ago

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT！🌏 Train a 26M-parameter GPT from scratch in just 2h! - jingyaogong/mi...

Added ago

A new tool that blends your everyday work apps into one.

Added ago

A new tool that blends your everyday work apps into one.

Added ago

Reproduce R1 Zero on Logic Puzzle. Contribute to Unakar/Logic-RL development by creating an account ...

Added ago

s1: Simple test-time scaling. Contribute to simplescaling/s1 development by creating an account on G...

Added ago

A Blog post by Open R1 on Hugging Face

Added ago

The Reinforcement Learning from Human Feedback Book

Added ago

Reproduce Deepseek R1 „aha moment“ and train an open model using reinforcement learning trying to te...

Added ago