Name: VibeThinker-3B
Availability: InStock
Author: Weibo

Learning Objectives

Understand what VibeThinker-3B is and why a 3 billion-parameter model drew outsized attention
Recognize the training ideas (curriculum fine-tuning plus reinforcement learning) behind its reasoning ability
Evaluate its benchmark claims with appropriate skepticism

What Is VibeThinker-3B?

VibeThinker-3B is an open-weights reasoning model released by Weibo AI in June 2026, described in the paper "VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models." It has just 3.1 billion parameters — small enough to run on a single consumer graphics card with about 6.7 gigabytes of memory — yet it posts math and coding-reasoning scores that rival models many times its size.

The model is built on the Qwen2.5-Coder-3B base and released under the permissive MIT license, with weights and training code published on Hugging Face and GitHub. The headline idea is that careful post-training, not raw scale, can unlock strong step-by-step reasoning in a tiny model.

💡Key Concept

Why "small model" matters. Most frontier reasoning lives in models with hundreds of billions of parameters that need data-center hardware. A 3 billion-parameter model that reasons well can run on a laptop or a single GPU — cheaper to serve, easier to study, and far more accessible to students and independent developers.

How It Was Trained

VibeThinker-3B follows what its authors call the Spectrum-to-Signal Principle:

Curriculum fine-tuning — a two-stage supervised phase that starts with a broad spectrum of valid reasoning examples, then shifts to harder, longer problems.
Multi-domain reinforcement learning — a stage that amplifies correct reasoning using verifiable rewards, via a technique the team calls MaxEnt-Guided Policy Optimization (a variant of the GRPO objective used widely in reasoning models).
Offline self-distillation — a final pass that consolidates the model's best behaviors.

The bet is that diversity in the fine-tuning data plus reward-driven reinforcement can elicit large-model reasoning from a small base.

Benchmark Claims (and Why to Be Careful)

On paper, VibeThinker-3B's reported scores are remarkable for its size:

Benchmark	Reported score	What it measures
AIME 2026 (math)	94.3 (97.1 with test-time scaling)	Hard competition mathematics
LiveCodeBench v6	80.2 Pass@1	Recent competitive-programming problems
LeetCode (unseen contests)	96.1% acceptance	Out-of-distribution code generalization
IFEval	93.4	Instruction following

Those numbers put a 3 billion-parameter model in the range of far larger systems on specific reasoning tasks — which is exactly why they have been contested. Independent observers have questioned whether the evaluation setup, test-time scaling, and benchmark selection flatter the model, and small models often generalize worse outside the narrow tasks they were tuned for. Treat VibeThinker-3B as a striking research result and a great model to experiment with — not as proof that a tiny model matches a frontier flagship in general use.

Pricing

Plan	Price	Features
Open weights	Free	MIT license Weights and training code on Hugging Face and GitHub Self-host on a single consumer GPU

Open weightsFree

MIT license
Weights and training code on Hugging Face and GitHub
Self-host on a single consumer GPU

As an open-weights model, VibeThinker-3B is free to download, run, and modify. Your only cost is the hardware (or rented GPU time) you run it on — which, at this size, is minimal compared to frontier models.

Strengths

Runs anywhere — about 6.7 gigabytes of GPU memory is enough, so it works on a single consumer card or a modest cloud instance
Strong reasoning for its size — competitive math and coding-reasoning scores that are unusual at 3 billion parameters
Fully open — MIT-licensed weights plus published training code make it easy to study and build on
A clean case study — the curriculum-plus-reinforcement recipe is a clear example of how post-training, not just scale, drives reasoning

Limitations and Considerations

Contested benchmarks — the most eye-catching scores are debated; verify on your own tasks before trusting them
Narrow strengths — tuned for math and coding reasoning; general knowledge, writing, and broad chat are not its focus
Small-model limits — 3 billion parameters cannot hold the world knowledge of a frontier model, and reasoning can break down outside its training distribution
Research artifact — released by a research team as a demonstration, not a supported commercial product with guarantees

Company Details

Detail	Info
Developer	Weibo AI
Released	June 2026
Parameters	3.1 billion (dense)
Base model	Qwen2.5-Coder-3B
License	MIT (open weights)
Availability	Hugging Face, GitHub

Key Takeaways

VibeThinker-3B is a 3.1 billion-parameter open reasoning model from Weibo AI that posts frontier-level math and coding-reasoning scores for its size
It is built on Qwen2.5-Coder-3B and MIT-licensed, runnable on a single consumer GPU with about 6.7 gigabytes of memory
Its training recipe — curriculum fine-tuning, reinforcement learning, and self-distillation — shows how post-training can unlock reasoning without massive scale
The headline benchmark claims are strong but contested; treat it as a research demonstration and verify on your own tasks before relying on it

VibeThinker-3B

Audio & video lessons are paid features