Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
8 min read·Updated April 28, 2026

Aya Expanse

Cohere logoBy Cohere

Aya Expanse is Cohere's open-source multilingual model family, spanning from the original 23-language text model to Aya Vision (multimodal) and Tiny Aya (70+ languages on edge devices). It advances language equity in AI.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand how the Aya family addresses the language equity gap in AI and why low-resource language support matters
  • Identify the capabilities across the Aya model lineup: Aya Expanse, Aya Vision, and Tiny Aya
  • Evaluate when Aya models are the right choice versus alternatives like Qwen 3.5, Gemma 3, or Llama 4

What Is Aya Expanse?

Aya Expanse is an open-source multilingual language model from Cohere For AI, the nonprofit research arm of Cohere (valued at $7 billion, with $240 million ARR and an IPO expected in 2026). It is specifically designed to bring high-quality AI capabilities to 23 languages — including many that are systematically underserved by major AI models, such as Hindi, Swahili, Arabic, Turkish, Vietnamese, and other languages spoken by billions of people worldwide.

The problem Aya Expanse addresses is fundamental: most frontier AI models are overwhelmingly trained on English data, with other languages treated as secondary. This means that for the majority of the world's population — people who primarily speak Hindi, Arabic, Swahili, Yoruba, or dozens of other languages — AI tools deliver significantly worse performance than they do for English speakers. Aya Expanse was built specifically to close this gap.

What makes Aya Expanse distinctive is its research-driven approach to language equity. Rather than simply adding multilingual training data to an English-first model, the Aya project involved thousands of contributors from around the world who created high-quality training data in their native languages. This community-driven data collection produced training sets that reflect how people actually use these languages — not just translated English content.

Since the original release, the Aya family has expanded significantly with Aya Vision (multimodal) and Tiny Aya (edge-optimized for 70+ languages), making it the most comprehensive open-source multilingual AI initiative in the world.

💡Key Concept

The AI language gap: As of 2026, over 80% of internet content used to train AI models is in English — yet only about 16% of the world's population speaks English natively. This creates a structural bias: AI tools work best for the linguistic minority and worst for the majority. Aya Expanse is one of the most significant efforts to address this imbalance, focusing on languages where high-quality training data has historically been scarce. The project's name "Aya" means "open" in multiple languages.

Tip

Try the Aya models: Available on Hugging Face — Aya Expanse, Aya Vision, and Tiny Aya variants are all open-source for research and commercial use

The Aya Model Family

ModelReleasedLanguagesKey Feature
Aya Expanse202423Core multilingual text model; community-driven training data from native speakers
Aya VisionMarch 202523Multimodal — accepts images + text input; visual reasoning across 23 languages
Tiny Aya GlobalFebruary 202670+3.35 billion parameters; broadest language coverage of any edge model; runs on laptops
Tiny Aya Earth (Africa)February 2026African languagesRegional variant optimized for African languages
Tiny Aya Fire (South Asia)February 2026South Asian languagesRegional variant optimized for Hindi, Bengali, Tamil, and more
Tiny Aya Water (Asia-Pacific/Europe)February 2026AP + European languagesRegional variant for Asian-Pacific and European languages

Aya Vision (March 2025)

Aya Vision extends the Aya family into multimodal territory — it accepts both images and text as input, enabling visual question answering, image description, chart interpretation, and document understanding across all 23 Aya languages. This is significant because most multimodal models are heavily English-biased in their visual reasoning; Aya Vision can describe an image in Hindi, analyze a chart in Arabic, or read a document in Turkish with native-quality understanding.

Tiny Aya (February 2026)

Tiny Aya represents a breakthrough in accessible multilingual AI. At just 3.35 billion parameters, these models support over 70 languages — making them the broadest-coverage edge-deployable language models available anywhere. They are designed to run on laptops, smartphones, and edge devices without requiring cloud connectivity or GPU infrastructure.

The regional variants (Earth, Fire, Water) are optimized for specific language clusters, delivering higher quality for their target regions than the Global variant while maintaining the same compact size. This allows organizations to deploy a variant tuned for their specific geographic context.

Pricing & Access

OptionPriceDetails
Open Source (Hugging Face)FreeAll Aya models — full weights for download, fine-tuning, and deployment
Cohere APIPay-per-tokenAccess Aya alongside Command R+ and enterprise models
Research UseFreeSpecifically designed for academic research with open data and methodology

All Aya models are fully open-source, reflecting Cohere For AI's nonprofit mission. There are no licensing restrictions for research or commercial use — organizations can download, fine-tune, and deploy the models without fees.

The Aya family sits alongside Cohere's enterprise-focused models:

  • Command R+: Cohere's commercial flagship for enterprise RAG, long-context retrieval, and tool use
  • Rerank 4: Retrieval reranking model with 32K context window supporting over 100 languages — essential for enterprise RAG pipelines that need multilingual search
  • Embed v3.0: Multimodal embedding model for search and retrieval — generates embeddings from both text and images across multiple languages

These enterprise models complement the open-source Aya family — organizations can use Aya for multilingual generation and Cohere's commercial models for enterprise search and retrieval infrastructure.

Core Capabilities

23-Language Coverage Including Low-Resource Languages

Aya Expanse supports 23 languages with high-quality performance: Arabic, Chinese (Simplified), Czech, Dutch, English, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Ukrainian, and Vietnamese. The model's training explicitly prioritizes languages that other models underserve.

70+ Languages on Edge Devices (Tiny Aya)

Tiny Aya dramatically expands language coverage to over 70 languages at a size (3.35 billion parameters) that runs on consumer hardware. No other edge-deployable model comes close to this language breadth — making Tiny Aya uniquely suited for offline deployments, mobile applications, and resource-constrained environments in developing regions.

Multimodal Multilingual Understanding (Aya Vision)

Aya Vision combines visual and textual understanding across all 23 supported languages. It can interpret images, charts, documents, and screenshots in context — a capability that is essential for real-world multilingual applications where information is not purely textual.

Community-Driven Training Data

Unlike models that rely on scraped web data, Aya's training included high-quality data contributed by native speakers across participating languages. This produces more natural, culturally appropriate output — the model understands idiomatic expressions, cultural context, and linguistic nuances that translation-based approaches miss.

Research-Backed Methodology

Every aspect of Aya Expanse's development — from data collection to training methodology to evaluation — is published as open research. This transparency allows other researchers to build on the work, reproduce results, and extend coverage to additional languages. The Aya project has produced multiple peer-reviewed papers advancing the state of multilingual AI research.

Strengths

  • Best-in-class low-resource language support: Purpose-built for languages that frontier models systematically underserve
  • Broadest edge model language coverage: Tiny Aya's 70+ languages at 3.35 billion parameters is unmatched by any other compact model
  • Full model family: Text (Aya Expanse), vision (Aya Vision), and edge (Tiny Aya) models cover a wide range of deployment scenarios
  • Open-source with no restrictions: Full weights available for research and commercial deployment across all variants
  • Community-driven data quality: Native speaker contributions produce more natural output than translation-based training
  • Research transparency: Published methodology and open datasets enable reproducibility and extension
  • Nonprofit mission: Cohere For AI's nonprofit structure aligns incentives with language equity rather than profit maximization
  • Regional optimization: Tiny Aya's regional variants (Earth, Fire, Water) provide higher quality for specific language clusters

Limitations & Considerations

  • Smaller than frontier models: Aya models prioritize multilingual breadth over raw scale — they are not designed to compete with GPT-5.5 or Claude on English-only benchmarks
  • 23-language core (Expanse/Vision): While Tiny Aya extends to 70+ languages, the full-capability Expanse and Vision models cover 23 — thousands of languages remain unsupported
  • Less consumer polish: No dedicated chatbot interface like ChatGPT — primarily accessed via API or Hugging Face
  • Tiny Aya quality tradeoffs: At 3.35 billion parameters, Tiny Aya models sacrifice some depth and nuance compared to larger models — best for simpler tasks in supported languages

Best Use Cases

TaskWhy Aya
Multilingual content creationNative-quality output in 23+ languages including underserved ones like Swahili and Hindi
Edge deployment in developing regionsTiny Aya runs offline on laptops/phones with 70+ language support — no cloud needed
Multilingual visual understandingAya Vision interprets images and documents across 23 languages
Cross-language researchAnalyze and summarize documents across multiple languages with cultural context
NGO and humanitarian applicationsFree, open models for organizations serving non-English-speaking populations
Academic multilingual NLP researchOpen methodology, data, and weights enable reproducible research
Localization and translationCommunity-trained data produces more natural translations than English-first models

When to choose alternatives:

  • Maximum multilingual coverage (100+ languages) with general-purpose capabilities → Qwen 3.5
  • Lightweight multilingual model with strong English → Gemma 3
  • English-first tasks with largest model ecosystem → Llama 4
  • Enterprise RAG with citations and reranking → Command R+ with Rerank 4 (Cohere's commercial models)
  • Multimodal reasoning in English → GPT-5.5 or Claude for stronger English-language vision capabilities

Getting Started

  1. Visit the Cohere For AI collection on Hugging Face to browse all Aya models — Expanse, Vision, and Tiny Aya variants
  2. For edge deployment, download a Tiny Aya variant (Global, Earth, Fire, or Water) based on your target languages
  3. For multimodal tasks, try Aya Vision with image + text prompts in a non-English language
  4. Test with prompts in a non-English language you know well — evaluate naturalness and cultural appropriateness
  5. Try a cross-language task: provide input in one language and request output in another to test multilingual transfer
  6. For API access, sign up on cohere.com and select the Aya model endpoint
  7. Explore the published Aya research papers to understand the training methodology and contribute to future data collection efforts

Tip

Practical tip: The best way to evaluate Aya models is to test them in a language you speak natively (other than English). Ask them to write in that language, explain cultural concepts, or translate idiomatic expressions. The difference between Aya's output and a general-purpose model's output is most visible in these culturally-grounded tasks — where English-first models often produce technically correct but culturally awkward results. For Tiny Aya, test on a laptop without internet to experience the edge deployment scenario.

Key Takeaways

  • The Aya family is the most comprehensive open-source multilingual AI initiative — spanning text (Aya Expanse, 23 languages), vision (Aya Vision, multimodal + 23 languages), and edge (Tiny Aya, 70+ languages at 3.35 billion parameters)
  • Tiny Aya offers the broadest language coverage of any edge-deployable model — 70+ languages at a size that runs on laptops and phones, with regional variants optimized for Africa, South Asia, and Asia-Pacific/Europe
  • Aya Vision brings multimodal understanding to multilingual AI — interpreting images and documents across 23 languages where most vision models are English-biased
  • Cohere (valued at $7 billion, IPO expected 2026) backs the Aya project through its nonprofit Cohere For AI arm, alongside enterprise models like Command R+, Rerank 4, and Embed v3.0
  • Community-driven training data from native speakers produces more natural and culturally appropriate output than translation-based approaches — a model for how inclusive AI should be built

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you