Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
6 min read·Updated June 10, 2026

North Mini Code

Cohere logoBy Cohere

North Mini Code is Cohere's first agentic coding model and its first open-weights release — a 30 billion-parameter mixture-of-experts design that keeps just 3 billion parameters active, handles a 256,000-token context, and ships under Apache 2.0 on Hugging Face. Cohere reports 33.4 on the Artificial Analysis Coding Index and up to 2.8-times the output throughput of Devstral Small 2 on comparable hardware.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand what North Mini Code is and why it marks a strategic shift for Cohere
  • Evaluate the model's mixture-of-experts architecture and efficiency claims
  • Identify when an open-weights coding model is the right fit versus a hosted frontier model

What Is North Mini Code?

North Mini Code is Cohere's first agentic coding model and its first open-weights release, launched on June 10, 2026. For a company long known for enterprise retrieval and its closed Command model family, shipping an open-weights model under a permissive license is a notable strategic pivot — it puts Cohere in direct competition with open coding models from Mistral, Moonshot, and Microsoft.

The model is built for code generation, agentic software-engineering workflows, and terminal task execution — the "do work in a repository" use cases rather than single-shot autocompletion.

💡Key Concept

Agentic coding model: A model tuned to operate inside a software project — reading files, running commands, editing across a codebase, and iterating toward a goal — rather than just suggesting the next line. North Mini Code is designed to be driven by coding agents and terminal harnesses.

Architecture and Specifications

North Mini Code uses a mixture-of-experts (MoE) design that activates only a fraction of its parameters on each step, keeping inference cheap relative to its total size.

SpecificationDetail
Total parameters30 billion
Active parameters3 billion (mixture-of-experts)
Context window256,000 tokens
Max generation64,000 tokens
LicenseApache 2.0
Minimum hardwareSingle Nvidia H100 at FP8 precision

The 30 billion total / 3 billion active split is the core efficiency story: the model reasons with the capacity of a large network while costing closer to a 3 billion-parameter model to run.

Performance

Cohere positions North Mini Code as competitive among similarly sized open models:

  • 33.4 on the Artificial Analysis Coding Index — a composite coding-capability score
  • Up to 2.8-times the output throughput of Devstral Small 2 on comparable hardware
  • A 30 percent advantage in inter-token latency, for more consistent generation speed

These are throughput-and-latency wins as much as capability wins — North Mini Code is pitched as a fast, deployable workhorse rather than a frontier-topping model.

Access and Deployment

North Mini Code is available through multiple channels:

  • Hugging Face — downloadable open weights
  • Cohere API — hosted inference
  • Cohere Model Vault — managed inference for enterprises
  • OpenRouter and OpenCode — third-party routing and agent platforms

Because the weights are open under Apache 2.0, teams can self-host on a single H100 for full data control, or call the hosted API when they would rather not run infrastructure.

Pricing

Open weightsFree (Apache 2.0)
  • Self-host on a single H100
  • Full data control
  • Commercial use permitted
Cohere APIUsage-based
  • Hosted inference
  • No infrastructure to manage
Model VaultEnterprise
  • Managed inference
  • Private deployment
  • Enterprise support

The Apache 2.0 license permits commercial use and modification, making North Mini Code a genuine option for teams that need an on-premises or air-gapped coding model.

How It Compares

ModelTypeDifferentiator
North Mini Code (Cohere)Open weights (Apache 2.0)Efficient 30B/3B MoE; high throughput; enterprise deployment options
Devstral Small 2 (Mistral)Open weightsNorth Mini Code claims up to 2.8-times its throughput
Kimi Code (Moonshot)Open weightsLong-context agentic coding
MAI-Code-1-Flash (Microsoft)HostedLightweight coding model

Strengths

  • Open weights under Apache 2.0 — self-host, modify, and use commercially with full data control
  • Efficient MoE design — 30 billion total but only 3 billion active parameters keeps inference cheap
  • High throughput — up to 2.8-times the output throughput of Devstral Small 2 with a 30 percent inter-token latency advantage
  • Flexible access — Hugging Face weights, Cohere API, Model Vault, OpenRouter, and OpenCode
  • Modest hardware floor — runs on a single Nvidia H100 at FP8

Limitations and Considerations

  • Not a frontier-topping model — a 33.4 Coding Index score is competitive for its size, not state-of-the-art versus hosted flagships
  • Cohere's first coding model — a newer entry without the track record of established coding tools
  • Self-hosting still needs an H100 — accessible for organizations but not laptop-class hardware
  • Best as an agent backend — tuned for agentic and terminal workflows, less a drop-in chat assistant
  • Cohere Coral — Cohere's enterprise chat assistant built on its Command model family
  • Kimi Code — Moonshot's open agentic coding model
  • MAI-Code-1-Flash — Microsoft's lightweight coding model
  • Cursor — Agentic AI code editor that can run open models as a backend

Key Takeaways

  • North Mini Code is Cohere's first agentic coding model and first open-weights release (June 10, 2026) — a strategic pivot toward open models for a company known for closed enterprise products
  • It is a 30 billion-parameter mixture-of-experts model that activates just 3 billion parameters, handles a 256,000-token context, and ships under Apache 2.0 on Hugging Face
  • Cohere reports 33.4 on the Artificial Analysis Coding Index and up to 2.8-times the output throughput of Devstral Small 2, with a 30 percent inter-token latency advantage
  • Available via Hugging Face weights, the Cohere API, Model Vault, OpenRouter, and OpenCode; runs on a single Nvidia H100 at FP8
  • Best understood as a fast, deployable open workhorse for agentic and terminal coding workflows rather than a frontier-topping model

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you