WelcomeUser Guide
ToSPrivacyCanary
DonateBugsLicense

©2024 Poal.co

454
[–] 1 pt

The techniques they certainly used like ensembling and fine tuning have existed for literally years. Congrats to chyna for finally figuring out fastai. I'm sure your knockoff model you wrote with 8 lines of code is sooooo great.

[–] 0 pt

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.

With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.

However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing.

To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen.

DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

[–] 1 pt

That explains what the funding was used to do (cold start), but again, this problem was solved many years ago in fastai. https://medium.com/@jelaniwoods/fastai-lesson-4-collaborative-filtering-454064ffe0a2

[–] 0 pt

Do you think DeepSeek is as jewed as OpenAI?

I checked its data cut-off and it seems to be October 2023.