Sublink
  • Newest
  • Dashboard
    ©2023|Sublink|Privacy|Contact|

    DeepSeek R1 reproduce

    Ke Fang
    Ke Fang

    Created 3 months ago

    Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

    Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learn...

    Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learnin...

    Added ago

    GitHub - jingyaogong/minimind: 🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

    GitHub - jingyaogong/minimind: 🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train ...

    🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h! - jingyaogong/mi...

    Added ago

    Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.

    Notion – The all-in-one workspace for your notes, tasks, wikis, and da...

    A new tool that blends your everyday work apps into one.

    Added ago

    Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.

    Notion – The all-in-one workspace for your notes, tasks, wikis, and da...

    A new tool that blends your everyday work apps into one.

    Added ago

    GitHub - Unakar/Logic-RL: Reproduce R1 Zero on Logic Puzzle

    GitHub - Unakar/Logic-RL: Reproduce R1 Zero on Logic Puzzle

    Reproduce R1 Zero on Logic Puzzle. Contribute to Unakar/Logic-RL development by creating an account ...

    Added ago

    GitHub - simplescaling/s1: s1: Simple test-time scaling

    GitHub - simplescaling/s1: s1: Simple test-time scaling

    s1: Simple test-time scaling. Contribute to simplescaling/s1 development by creating an account on G...

    Added ago

    Open-R1: Update #1

    Open-R1: Update #1

    A Blog post by Open R1 on Hugging Face

    Added ago

    RLHF Book by Nathan Lambert

    RLHF Book by Nathan Lambert

    The Reinforcement Learning from Human Feedback Book

    Added ago

    Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial

    Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial

    Reproduce Deepseek R1 „aha moment“ and train an open model using reinforcement learning trying to te...

    Added ago

    Login to subscribe this collection.