Name: Aya Expanse
Availability: InStock
Author: Cohere

Learning Objectives

Understand how the Aya family addresses the language equity gap in AI and why low-resource language support matters
Identify the capabilities across the Aya model lineup: Aya Expanse, Aya Vision, and Tiny Aya
Evaluate when Aya models are the right choice versus alternatives like Qwen 3.5, Gemma 3, or Llama 4

What Is Aya Expanse?

Aya Expanse is an open-source multilingual language model from Cohere For AI, the nonprofit research arm of Cohere (valued at $7 billion, with $240 million ARR and an IPO expected in 2026). It is specifically designed to bring high-quality AI capabilities to 23 languages — including many that are systematically underserved by major AI models, such as Hindi, Swahili, Arabic, Turkish, Vietnamese, and other languages spoken by billions of people worldwide.

The problem Aya Expanse addresses is fundamental: most frontier AI models are overwhelmingly trained on English data, with other languages treated as secondary. This means that for the majority of the world's population — people who primarily speak Hindi, Arabic, Swahili, Yoruba, or dozens of other languages — AI tools deliver significantly worse performance than they do for English speakers. Aya Expanse was built specifically to close this gap.

What makes Aya Expanse distinctive is its research-driven approach to language equity. Rather than simply adding multilingual training data to an English-first model, the Aya project involved thousands of contributors from around the world who created high-quality training data in their native languages. This community-driven data collection produced training sets that reflect how people actually use these languages — not just translated English content.

Since the original release, the Aya family has expanded significantly with Aya Vision (multimodal) and Tiny Aya (edge-optimized for 70+ languages), making it the most comprehensive open-source multilingual AI initiative in the world.

💡Key Concept

The AI language gap: As of 2026, over 80% of internet content used to train AI models is in English — yet only about 16% of the world's population speaks English natively. This creates a structural bias: AI tools work best for the linguistic minority and worst for the majority. Aya Expanse is one of the most significant efforts to address this imbalance, focusing on languages where high-quality training data has historically been scarce. The project's name "Aya" means "open" in multiple languages.

✅Tip

Try the Aya models: Available on Hugging Face — Aya Expanse, Aya Vision, and Tiny Aya variants are all open-source for research and commercial use

The Aya Model Family

Model	Released	Languages	Key Feature
Aya Expanse	2024	23	Core multilingual text model; community-driven training data from native speakers
Aya Vision	March 2025	23	Multimodal — accepts images + text input; visual reasoning across 23 languages
Tiny Aya Global	February 2026	70+	3.35 billion parameters; broadest language coverage of any edge model; runs on laptops
Tiny Aya Earth (Africa)	February 2026	African languages	Regional variant optimized for African languages
Tiny Aya Fire (South Asia)	February 2026	South Asian languages	Regional variant optimized for Hindi, Bengali, Tamil, and more
Tiny Aya Water (Asia-Pacific/Europe)	February 2026	AP + European languages	Regional variant for Asian-Pacific and European languages

Aya Vision (March 2025)

Aya Vision extends the Aya family into multimodal territory — it accepts both images and text as input, enabling visual question answering, image description, chart interpretation, and document understanding across all 23 Aya languages. This is significant because most multimodal models are heavily English-biased in their visual reasoning; Aya Vision can describe an image in Hindi, analyze a chart in Arabic, or read a document in Turkish with native-quality understanding.

Tiny Aya (February 2026)

Tiny Aya represents a breakthrough in accessible multilingual AI. At just 3.35 billion parameters, these models support over 70 languages — making them the broadest-coverage edge-deployable language models available anywhere. They are designed to run on laptops, smartphones, and edge devices without requiring cloud connectivity or GPU infrastructure.

The regional variants (Earth, Fire, Water) are optimized for specific language clusters, delivering higher quality for their target regions than the Global variant while maintaining the same compact size. This allows organizations to deploy a variant tuned for their specific geographic context.

Pricing & Access

Option	Price	Details
Open Source (Hugging Face)	Free	All Aya models — full weights for download, fine-tuning, and deployment
Cohere API	Pay-per-token	Access Aya alongside Command R+ and enterprise models
Research Use	Free	Specifically designed for academic research with open data and methodology

All Aya models are fully open-source, reflecting Cohere For AI's nonprofit mission. There are no licensing restrictions for research or commercial use — organizations can download, fine-tune, and deploy the models without fees.

The Aya family sits alongside Cohere's enterprise-focused models:

Command R+: Cohere's commercial flagship for enterprise RAG, long-context retrieval, and tool use
Rerank 4: Retrieval reranking model with 32K context window supporting over 100 languages — essential for enterprise RAG pipelines that need multilingual search
Embed v3.0: Multimodal embedding model for search and retrieval — generates embeddings from both text and images across multiple languages

These enterprise models complement the open-source Aya family — organizations can use Aya for multilingual generation and Cohere's commercial models for enterprise search and retrieval infrastructure.

Core Capabilities

23-Language Coverage Including Low-Resource Languages

Aya Expanse supports 23 languages with high-quality performance: Arabic, Chinese (Simplified), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese. The model's training explicitly prioritizes languages that other models underserve.

70+ Languages on Edge Devices (Tiny Aya)

Tiny Aya dramatically expands language coverage to over 70 languages at a size (3.35 billion parameters) that runs on consumer hardware. No other edge-deployable model comes close to this language breadth — making Tiny Aya uniquely suited for offline deployments, mobile applications, and resource-constrained environments in developing regions.

Multimodal Multilingual Understanding (Aya Vision)

Aya Vision combines visual and textual understanding across all 23 supported languages. It can interpret images, charts, documents, and screenshots in context — a capability that is essential for real-world multilingual applications where information is not purely textual.

Community-Driven Training Data

Unlike models that rely on scraped web data, Aya's training included high-quality data contributed by native speakers across participating languages. This produces more natural, culturally appropriate output — the model understands idiomatic expressions, cultural context, and linguistic nuances that translation-based approaches miss.

Research-Backed Methodology

Every aspect of Aya Expanse's development — from data collection to training methodology to evaluation — is published as open research. This transparency allows other researchers to build on the work, reproduce results, and extend coverage to additional languages. The Aya project has produced multiple peer-reviewed papers advancing the state of multilingual AI research.

Strengths

Best-in-class low-resource language support: Purpose-built for languages that frontier models systematically underserve
Broadest edge model language coverage: Tiny Aya's 70+ languages at 3.35 billion parameters is unmatched by any other compact model
Full model family: Text (Aya Expanse), vision (Aya Vision), and edge (Tiny Aya) models cover a wide range of deployment scenarios
Open-source with no restrictions: Full weights available for research and commercial deployment across all variants
Community-driven data quality: Native speaker contributions produce more natural output than translation-based training
Research transparency: Published methodology and open datasets enable reproducibility and extension
Nonprofit mission: Cohere For AI's nonprofit structure aligns incentives with language equity rather than profit maximization
Regional optimization: Tiny Aya's regional variants (Earth, Fire, Water) provide higher quality for specific language clusters

Limitations & Considerations

Smaller than frontier models: Aya models prioritize multilingual breadth over raw scale — they are not designed to compete with GPT-5.5 or Claude on English-only benchmarks
23-language core (Expanse/Vision): While Tiny Aya extends to 70+ languages, the full-capability Expanse and Vision models cover 23 — thousands of languages remain unsupported
Less consumer polish: No dedicated chatbot interface like ChatGPT — primarily accessed via API or Hugging Face
Tiny Aya quality tradeoffs: At 3.35 billion parameters, Tiny Aya models sacrifice some depth and nuance compared to larger models — best for simpler tasks in supported languages

Best Use Cases

Task	Why Aya
Multilingual content creation	Native-quality output in 23+ languages including underserved ones like Swahili and Hindi
Edge deployment in developing regions	Tiny Aya runs offline on laptops/phones with 70+ language support — no cloud needed
Multilingual visual understanding	Aya Vision interprets images and documents across 23 languages
Cross-language research	Analyze and summarize documents across multiple languages with cultural context
NGO and humanitarian applications	Free, open models for organizations serving non-English-speaking populations
Academic multilingual NLP research	Open methodology, data, and weights enable reproducible research
Localization and translation	Community-trained data produces more natural translations than English-first models

When to choose alternatives:

Maximum multilingual coverage (100+ languages) with general-purpose capabilities → Qwen 3.5
Lightweight multilingual model with strong English → Gemma 3
English-first tasks with largest model ecosystem → Llama 4
Enterprise RAG with citations and reranking → Command R+ with Rerank 4 (Cohere's commercial models)
Multimodal reasoning in English → GPT-5.5 or Claude for stronger English-language vision capabilities

Getting Started

Visit the Cohere For AI collection on Hugging Face to browse all Aya models — Expanse, Vision, and Tiny Aya variants
For edge deployment, download a Tiny Aya variant (Global, Earth, Fire, or Water) based on your target languages
For multimodal tasks, try Aya Vision with image + text prompts in a non-English language
Test with prompts in a non-English language you know well — evaluate naturalness and cultural appropriateness
Try a cross-language task: provide input in one language and request output in another to test multilingual transfer
For API access, sign up on cohere.com and select the Aya model endpoint
Explore the published Aya research papers to understand the training methodology and contribute to future data collection efforts

✅Tip

Practical tip: The best way to evaluate Aya models is to test them in a language you speak natively (other than English). Ask them to write in that language, explain cultural concepts, or translate idiomatic expressions. The difference between Aya's output and a general-purpose model's output is most visible in these culturally-grounded tasks — where English-first models often produce technically correct but culturally awkward results. For Tiny Aya, test on a laptop without internet to experience the edge deployment scenario.

Key Takeaways

The Aya family is the most comprehensive open-source multilingual AI initiative — spanning text (Aya Expanse, 23 languages), vision (Aya Vision, multimodal + 23 languages), and edge (Tiny Aya, 70+ languages at 3.35 billion parameters)
Tiny Aya offers the broadest language coverage of any edge-deployable model — 70+ languages at a size that runs on laptops and phones, with regional variants optimized for Africa, South Asia, and Asia-Pacific/Europe
Aya Vision brings multimodal understanding to multilingual AI — interpreting images and documents across 23 languages where most vision models are English-biased
Cohere (valued at $7 billion, IPO expected 2026) backs the Aya project through its nonprofit Cohere For AI arm, alongside enterprise models like Command R+, Rerank 4, and Embed v3.0
Community-driven training data from native speakers produces more natural and culturally appropriate output than translation-based approaches — a model for how inclusive AI should be built

Aya Expanse

Audio & video lessons are paid features

Learning Objectives

What Is Aya Expanse?

The Aya Model Family

Aya Vision (March 2025)

Tiny Aya (February 2026)

Pricing & Access

Core Capabilities

23-Language Coverage Including Low-Resource Languages

70+ Languages on Edge Devices (Tiny Aya)

Multimodal Multilingual Understanding (Aya Vision)

Community-Driven Training Data

Research-Backed Methodology

Strengths

Limitations & Considerations

Best Use Cases

Getting Started

Key Takeaways

Save your progress & take the quiz

Audio & video lessons are paid features

Learning Objectives

What Is Aya Expanse?

The Aya Model Family

Aya Vision (March 2025)

Tiny Aya (February 2026)

Pricing & Access

Related Cohere Models

Core Capabilities

23-Language Coverage Including Low-Resource Languages

70+ Languages on Edge Devices (Tiny Aya)

Multimodal Multilingual Understanding (Aya Vision)

Community-Driven Training Data

Research-Backed Methodology

Strengths

Limitations & Considerations

Best Use Cases

Getting Started

Key Takeaways

Save your progress & take the quiz