Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
7 min read·Updated May 30, 2026

Liquid LFM (Liquid AI)

Liquid AI logoBy Liquid AI

Liquid LFM is the family of open-weight Liquid Foundation Models from MIT CSAIL spinoff Liquid AI, engineered for on-device inference. Current flagship LFM2.5-8B-A1B is an 8 billion-parameter mixture-of-experts model with roughly 1 billion active parameters per token, pretrained on 38 trillion tokens, with a 128,000-token context window and day-one runtime support across llama.cpp, MLX, vLLM, SGLang, and ONNX — delivering around 253 tokens per second on an Apple M5 Max and roughly 30 tokens per second on flagship smartphones.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand how Liquid AI's Liquid Foundation Models differ from frontier transformer-based LLMs in architecture and intended deployment
  • Identify the headline 2026 release LFM2.5-8B-A1B and its parameter, training, and runtime characteristics
  • Evaluate when a Liquid LFM is the right choice for on-device, edge, or data-residency-constrained AI workloads

What Is Liquid LFM?

Liquid LFM — the Liquid Foundation Model line — is the family of open-weight language models built by Liquid AI, a Cambridge, Massachusetts foundation-model lab spun out of MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) in 2023. The four founders — Ramin Hasani (CEO), Mathias Lechner, Alexander Amini, and Daniela Rus, the CSAIL director — built the company around their academic work on liquid neural networks, a class of continuous-time models originally developed for robotic control and adapted into the LFM line.

Liquid's design thesis is that the next decade of generative AI will run primarily on-device — laptops, phones, automotive electronic control units, industrial controllers, embedded coding assistants — rather than in hyperscaler data centers, and that the dominant transformer architecture is poorly matched to that constraint. Every LFM design choice follows from that thesis: a sparse mixture-of-experts architecture that keeps active parameters small relative to total parameters, an open-weight license that lets customers fine-tune and deploy without per-token API fees, and day-one support for the runtimes that actually ship to end-user hardware.

Tip

Try Liquid LFM: liquid.ai — open weights on Hugging Face; LEAP customization platform for enterprise fine-tuning and deployment; Liquid Apollo consumer on-device app for hands-on exploration. No API key or hosted endpoint required to run LFM2.5-8B-A1B locally.

The Headline Release — LFM2.5-8B-A1B

The current flagship is LFM2.5-8B-A1B, released open-weight in late May 2026. The naming convention encodes the architecture: 8 billion total parameters (8B) with roughly 1 billion active per token (A1B) via the mixture-of-experts routing.

SpecificationLFM2.5-8B-A1B
Total parameters8 billion (MoE)
Active parameters per tokenRoughly 1 billion
Pretraining tokens38 trillion
Context window128,000 tokens (post-trained from 32,000)
Vocabulary128,000 tokens (doubled from prior generation for non-Latin language efficiency)
LicenseOpen-weight, no use restrictions
Runtime supportllama.cpp, MLX, vLLM, SGLang, ONNX (day one)

The release is open-weight with no use restrictions — Liquid explicitly avoided the modified-MIT or Llama-style commercial caveats that other labs attach to their weights. Customers can download, fine-tune, embed in commercial products, and redistribute derivatives without negotiating with Liquid.

💡Key Concept

Mixture-of-experts at the edge. An 8 billion total / 1 billion active MoE looks dense-equivalent to roughly a 1 billion-parameter model at inference time — meaning the memory footprint and per-token compute on a laptop or phone is closer to a dense 1B than a dense 8B. The total knowledge capacity, however, is closer to the 8B. The architecture is what makes credible flagship-mini quality on consumer hardware mathematically possible; the day-one runtime support is what makes it shippable.

On-Device Performance

Liquid reports the following inference rates on consumer hardware — the headline number for an on-device foundation model is not aggregate FLOPs but tokens-per-second on devices that real users own:

DeviceTokens per second
Apple M5 Max (laptop)Roughly 253
AMD Ryzen AI Max Plus (laptop)Roughly 146
Flagship smartphone (high-end Snapdragon / Apple A-series)Roughly 30

For context: 30 tokens per second is well above human reading speed and is enough for conversational chat agents, dictation, real-time on-device tutoring, and most agent loops where the tokens-per-second floor is set by the user reading the previous turn before the next one is needed. Laptop-class performance at 146 to 253 tokens per second is in the same band as cloud-hosted flagship-mini tiers like GPT-5.5-mini, Claude Haiku 4.5, and Gemini Flash.

Benchmark Performance

LFM2.5-8B-A1B's reported benchmark wins relative to its predecessor LFM2 line:

BenchmarkLFM2.5-8B-A1BLFM2 baseline
AA-Omniscience Index-24.70-78.42
Non-Hallucination Rate63.47%7.46%
IFEval (instruction following)91.84
MATH50088.76
Tau-Squared Telecom (agentic)88.07

The non-hallucination rate jump — from 7.46% to 63.47% — is the single most consequential delta for on-device deployment: hallucination rates in the high-90s sink any application that puts model output in front of an end user without a human-in-the-loop validator. The instruction-following and agentic-task scores (IFEval, Tau-Squared) put LFM2.5-8B-A1B in the competitive band for keyboard agents, voice assistants, and lightweight tool-use loops at the edge.

⚠️Warning

Open-weight benchmark caveat. Vendor-published benchmarks reflect the configuration the vendor ran. On-device deployments hit quantization, runtime, and memory-constraint trade-offs that can shift scores by several points in either direction. Validate against your actual deployment configuration — same runtime, same quantization, same context length — before committing to a production pattern.

How Liquid Compares

ModelTotal paramsActive per tokenStrongest fit
Liquid LFM2.5-8B-A1B8 billionRoughly 1 billionOn-device inference; open-weight with no use restrictions
Google Gemma 3 (4B / 9B variants)4-9 billionSame as totalOn-device with Google-tooling integration
Microsoft Phi-414 billionSame as totalOn-device with strong reasoning per parameter
Apple on-device Foundation Models3 billionSame as totalApple-only; tight OS integration
Mistral Ministral (3B / 8B)3-8 billionSame as totalEU-aligned edge deployment

Liquid's differentiation is the MoE-at-the-edge design choice — smaller active-parameter count for the same memory footprint as competitors of similar total size, and the no-use-restrictions open-weight license that contrasts with Llama's Meta-imposed terms and Gemma's Google-aligned ones.

Beyond the LFM Line

Liquid sells two products built on top of the LFM checkpoints, mostly aimed at enterprise customers who need more than the open-weight download alone:

  • Liquid LEAP — customization and deployment platform. Enterprise teams fine-tune LFM checkpoints against private data, validate, package the artifacts, and ship them to the customer's devices. Comparable to Hugging Face Inference Endpoints crossed with an on-device deployment toolchain.
  • Liquid Apollo — consumer-facing on-device AI application. The headline use case is hands-on exploration: install Apollo, run an LFM locally, and validate that on-device inference at this parameter band actually feels usable before committing to a deployment.

Pricing

Open WeightsFree
  • Download LFM2.5-8B-A1B from Hugging Face
  • No use restrictions
  • Self-host, fine-tune, redistribute derivatives
LEAPContact sales
  • Enterprise fine-tuning and deployment platform
  • Custom on-device packaging
  • Private-data fine-tuning support
Liquid ApolloConsumer app
  • On-device LFM runtime
  • Hands-on exploration before deployment commitment

The open-weight tier is the centerpiece — Liquid's revenue model is the enterprise LEAP platform plus consulting and custom-model engagements, not per-token API fees on the base models. For most evaluators the right starting point is downloading LFM2.5-8B-A1B and testing it in your target runtime before any commercial conversation.

Strengths

  • On-device first: Architecture, runtime support, and quantization choices all optimized for laptop and phone inference rather than data-center batching
  • Open-weight with no use restrictions: Substantially more permissive than Llama (Meta acceptable use), Gemma (Google AUP), or even modified-MIT releases
  • Strong non-hallucination rate: 63.47% non-hallucination is in the band that actually lets you put on-device output in front of users
  • Day-one ecosystem support: llama.cpp, MLX, vLLM, SGLang, and ONNX from launch — no waiting for community ports
  • MIT CSAIL pedigree: Founders include Daniela Rus (CSAIL director), grounding the lab in academic depth rather than pure product engineering
  • Credible alternative to hyperscaler inference cost: Removes the largest line item for AI-product companies whose workloads don't require absolute frontier capability

Limitations & Considerations

  • Not a frontier-capability model: Claude Opus 4.8, GPT-5.5, and Gemini 3.5 Pro maintain large leads at the capability ceiling; LFM2.5-8B-A1B is the on-device deployment alternative, not the flagship competitor
  • Smaller community than Llama or Mistral: Fewer English-language tutorials, integration guides, and third-party tooling than the largest open-weight communities — though day-one major-runtime support partially compensates
  • No hosted API: If you want a managed cloud endpoint without self-hosting, you'll need to go through a third-party host that picks up the open weights — Liquid does not sell hosted inference
  • Active-parameter MoE is harder to reason about than dense: Performance characteristics on bespoke runtimes (custom kernels, FPGA, NPUs) require empirical validation rather than scaling-law extrapolation from dense models
  • Hardware variability matters: Reported tokens-per-second numbers are device-flagship — older or budget hardware will land well below the headline figures

Best Use Cases

TaskWhy LFM2.5-8B-A1B
Keyboard agents and on-device autocompleteSub-second latency on phones; no per-token cloud fee
Voice assistants embedded in productsLocal inference avoids round-trip latency to a cloud endpoint
Industrial-controller and embedded copilotsOpen weights deployable on constrained hardware without phoning home
Data-residency-constrained workloadsNothing leaves the device; passes the strictest data-sovereignty audits
Cost-sensitive AI-product companiesRemoves the largest cost line for applications below the frontier capability bar
Edge agents for in-car, in-factory, in-field deploymentOpen-weight + day-one runtime support fits embedded-engineering workflows

When to choose alternatives:

  • Frontier capability ceiling for hosted use → Claude Opus 4.8, GPT-5.5, Gemini 3.5 Pro
  • Strongest open-weight community and tooling → Mistral Large 3 or Llama 4 derivatives
  • Tightest Apple-platform integration → Apple on-device Foundation Models
  • Hosted inference with no self-hosting overhead → any flagship-mini hosted endpoint

Getting Started

  1. Download the weights from Hugging Face — search for LiquidAI/LFM2.5-8B-A1B (base) or the post-trained variant
  2. Pick a runtime — llama.cpp for CPU + GGUF deployments, MLX for Apple Silicon, vLLM or SGLang for batched server-side inference, ONNX for cross-platform mobile / embedded
  3. Validate on your target device with a representative prompt set — vendor-reported tokens-per-second numbers are device-flagship; verify your hardware before committing
  4. For private-data fine-tuning without rolling your own pipeline, contact Liquid about the LEAP platform
  5. For hands-on consumer exploration, install Liquid Apollo and run the model on your phone before committing to an embedded deployment

Key Takeaways

  • Liquid LFM is the family of open-weight Liquid Foundation Models from Cambridge-based MIT CSAIL spinoff Liquid AI, engineered for on-device inference rather than data-center batching
  • The current flagship LFM2.5-8B-A1B is an 8 billion-parameter mixture-of-experts model with roughly 1 billion active parameters per token, pretrained on 38 trillion tokens, with a 128,000-token context window
  • The model ships open-weight with no use restrictions, plus day-one runtime support across llama.cpp, MLX, vLLM, SGLang, and ONNX — closing the historical integration gap for open-weight on-device deployment
  • Reported throughput is roughly 253 tokens per second on an Apple M5 Max, 146 tokens per second on a Ryzen AI Max Plus, and around 30 tokens per second on flagship smartphones — credible local inference in the same band as cloud-hosted flagship-mini tiers
  • Best suited for keyboard agents, voice assistants, embedded copilots, data-residency-constrained workloads, and any AI-product application where hyperscaler inference cost is the binding constraint
  • Liquid also sells the LEAP customization and deployment platform for enterprise teams and the Liquid Apollo consumer app for hands-on exploration; revenue model is enterprise-platform, not per-token API fees
  • Sits alongside Apple on-device Foundation Models, Microsoft Phi, Google Gemma, and Mistral Ministral as one of the labs building explicitly for the on-device constraint — Liquid's differentiation is the MoE-at-the-edge design and the no-use-restrictions license

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you