Learning Objectives
- Understand what North Mini Code is and why it marks a strategic shift for Cohere
- Evaluate the model's mixture-of-experts architecture and efficiency claims
- Identify when an open-weights coding model is the right fit versus a hosted frontier model
What Is North Mini Code?
North Mini Code is Cohere's first agentic coding model and its first open-weights release, launched on June 10, 2026. For a company long known for enterprise retrieval and its closed Command model family, shipping an open-weights model under a permissive license is a notable strategic pivot — it puts Cohere in direct competition with open coding models from Mistral, Moonshot, and Microsoft.
The model is built for code generation, agentic software-engineering workflows, and terminal task execution — the "do work in a repository" use cases rather than single-shot autocompletion.
💡Key Concept
Agentic coding model: A model tuned to operate inside a software project — reading files, running commands, editing across a codebase, and iterating toward a goal — rather than just suggesting the next line. North Mini Code is designed to be driven by coding agents and terminal harnesses.
Architecture and Specifications
North Mini Code uses a mixture-of-experts (MoE) design that activates only a fraction of its parameters on each step, keeping inference cheap relative to its total size.
| Specification | Detail |
|---|---|
| Total parameters | 30 billion |
| Active parameters | 3 billion (mixture-of-experts) |
| Context window | 256,000 tokens |
| Max generation | 64,000 tokens |
| License | Apache 2.0 |
| Minimum hardware | Single Nvidia H100 at FP8 precision |
The 30 billion total / 3 billion active split is the core efficiency story: the model reasons with the capacity of a large network while costing closer to a 3 billion-parameter model to run.
Performance
Cohere positions North Mini Code as competitive among similarly sized open models:
- 33.4 on the Artificial Analysis Coding Index — a composite coding-capability score
- Up to 2.8-times the output throughput of Devstral Small 2 on comparable hardware
- A 30 percent advantage in inter-token latency, for more consistent generation speed
These are throughput-and-latency wins as much as capability wins — North Mini Code is pitched as a fast, deployable workhorse rather than a frontier-topping model.
Access and Deployment
North Mini Code is available through multiple channels:
- Hugging Face — downloadable open weights
- Cohere API — hosted inference
- Cohere Model Vault — managed inference for enterprises
- OpenRouter and OpenCode — third-party routing and agent platforms
Because the weights are open under Apache 2.0, teams can self-host on a single H100 for full data control, or call the hosted API when they would rather not run infrastructure.
Pricing
- Self-host on a single H100
- Full data control
- Commercial use permitted
- Hosted inference
- No infrastructure to manage
- Managed inference
- Private deployment
- Enterprise support
The Apache 2.0 license permits commercial use and modification, making North Mini Code a genuine option for teams that need an on-premises or air-gapped coding model.
How It Compares
| Model | Type | Differentiator |
|---|---|---|
| North Mini Code (Cohere) | Open weights (Apache 2.0) | Efficient 30B/3B MoE; high throughput; enterprise deployment options |
| Devstral Small 2 (Mistral) | Open weights | North Mini Code claims up to 2.8-times its throughput |
| Kimi Code (Moonshot) | Open weights | Long-context agentic coding |
| MAI-Code-1-Flash (Microsoft) | Hosted | Lightweight coding model |
Strengths
- Open weights under Apache 2.0 — self-host, modify, and use commercially with full data control
- Efficient MoE design — 30 billion total but only 3 billion active parameters keeps inference cheap
- High throughput — up to 2.8-times the output throughput of Devstral Small 2 with a 30 percent inter-token latency advantage
- Flexible access — Hugging Face weights, Cohere API, Model Vault, OpenRouter, and OpenCode
- Modest hardware floor — runs on a single Nvidia H100 at FP8
Limitations and Considerations
- Not a frontier-topping model — a 33.4 Coding Index score is competitive for its size, not state-of-the-art versus hosted flagships
- Cohere's first coding model — a newer entry without the track record of established coding tools
- Self-hosting still needs an H100 — accessible for organizations but not laptop-class hardware
- Best as an agent backend — tuned for agentic and terminal workflows, less a drop-in chat assistant
Related Tools
- Cohere Coral — Cohere's enterprise chat assistant built on its Command model family
- Kimi Code — Moonshot's open agentic coding model
- MAI-Code-1-Flash — Microsoft's lightweight coding model
- Cursor — Agentic AI code editor that can run open models as a backend
Key Takeaways
- North Mini Code is Cohere's first agentic coding model and first open-weights release (June 10, 2026) — a strategic pivot toward open models for a company known for closed enterprise products
- It is a 30 billion-parameter mixture-of-experts model that activates just 3 billion parameters, handles a 256,000-token context, and ships under Apache 2.0 on Hugging Face
- Cohere reports 33.4 on the Artificial Analysis Coding Index and up to 2.8-times the output throughput of Devstral Small 2, with a 30 percent inter-token latency advantage
- Available via Hugging Face weights, the Cohere API, Model Vault, OpenRouter, and OpenCode; runs on a single Nvidia H100 at FP8
- Best understood as a fast, deployable open workhorse for agentic and terminal coding workflows rather than a frontier-topping model