Name: Liquid LFM
Availability: InStock
Author: Liquid AI

Learning Objectives

Understand how Liquid AI's Liquid Foundation Models differ from frontier transformer-based LLMs in architecture and intended deployment
Identify the headline 2026 release LFM2.5-8B-A1B and its parameter, training, and runtime characteristics
Evaluate when a Liquid LFM is the right choice for on-device, edge, or data-residency-constrained AI workloads

What Is Liquid LFM?

Liquid LFM — the Liquid Foundation Model line — is the family of open-weight language models built by Liquid AI, a Cambridge, Massachusetts foundation-model lab spun out of MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) in 2023. The four founders — Ramin Hasani (CEO), Mathias Lechner, Alexander Amini, and Daniela Rus, the CSAIL director — built the company around their academic work on liquid neural networks, a class of continuous-time models originally developed for robotic control and adapted into the LFM line.

Liquid's design thesis is that the next decade of generative AI will run primarily on-device — laptops, phones, automotive electronic control units, industrial controllers, embedded coding assistants — rather than in hyperscaler data centers, and that the dominant transformer architecture is poorly matched to that constraint. Every LFM design choice follows from that thesis: a sparse mixture-of-experts architecture that keeps active parameters small relative to total parameters, an open-weight license that lets customers fine-tune and deploy without per-token API fees, and day-one support for the runtimes that actually ship to end-user hardware.

✅Tip

Try Liquid LFM: liquid.ai — open weights on Hugging Face; LEAP customization platform for enterprise fine-tuning and deployment; Liquid Apollo consumer on-device app for hands-on exploration. No API key or hosted endpoint required to run LFM2.5-8B-A1B locally.

The Headline Release — LFM2.5-8B-A1B

The current flagship is LFM2.5-8B-A1B, released open-weight in late May 2026. The naming convention encodes the architecture: 8 billion total parameters (8B) with roughly 1 billion active per token (A1B) via the mixture-of-experts routing.

Specification	LFM2.5-8B-A1B
Total parameters	8 billion (MoE)
Active parameters per token	Roughly 1 billion
Pretraining tokens	38 trillion
Context window	128,000 tokens (post-trained from 32,000)
Vocabulary	128,000 tokens (doubled from prior generation for non-Latin language efficiency)
License	Open-weight, no use restrictions
Runtime support	llama.cpp, MLX, vLLM, SGLang, ONNX (day one)

The release is open-weight with no use restrictions — Liquid explicitly avoided the modified-MIT or Llama-style commercial caveats that other labs attach to their weights. Customers can download, fine-tune, embed in commercial products, and redistribute derivatives without negotiating with Liquid.

💡Key Concept

Mixture-of-experts at the edge. An 8 billion total / 1 billion active MoE looks dense-equivalent to roughly a 1 billion-parameter model at inference time — meaning the memory footprint and per-token compute on a laptop or phone is closer to a dense 1B than a dense 8B. The total knowledge capacity, however, is closer to the 8B. The architecture is what makes credible flagship-mini quality on consumer hardware mathematically possible; the day-one runtime support is what makes it shippable.

On-Device Performance

Liquid reports the following inference rates on consumer hardware — the headline number for an on-device foundation model is not aggregate FLOPs but tokens-per-second on devices that real users own:

Device	Tokens per second
Apple M5 Max (laptop)	Roughly 253
AMD Ryzen AI Max Plus (laptop)	Roughly 146
Flagship smartphone (high-end Snapdragon / Apple A-series)	Roughly 30

For context: 30 tokens per second is well above human reading speed and is enough for conversational chat agents, dictation, real-time on-device tutoring, and most agent loops where the tokens-per-second floor is set by the user reading the previous turn before the next one is needed. Laptop-class performance at 146 to 253 tokens per second is in the same band as cloud-hosted flagship-mini tiers like GPT-5.5-mini, Claude Haiku 4.5, and Gemini Flash.

Benchmark Performance

LFM2.5-8B-A1B's reported benchmark wins relative to its predecessor LFM2 line:

Benchmark	LFM2.5-8B-A1B	LFM2 baseline
AA-Omniscience Index	-24.70	-78.42
Non-Hallucination Rate	63.47%	7.46%
IFEval (instruction following)	91.84	—
MATH500	88.76	—
Tau-Squared Telecom (agentic)	88.07	—

The non-hallucination rate jump — from 7.46% to 63.47% — is the single most consequential delta for on-device deployment: hallucination rates in the high-90s sink any application that puts model output in front of an end user without a human-in-the-loop validator. The instruction-following and agentic-task scores (IFEval, Tau-Squared) put LFM2.5-8B-A1B in the competitive band for keyboard agents, voice assistants, and lightweight tool-use loops at the edge.

⚠️Warning

Open-weight benchmark caveat. Vendor-published benchmarks reflect the configuration the vendor ran. On-device deployments hit quantization, runtime, and memory-constraint trade-offs that can shift scores by several points in either direction. Validate against your actual deployment configuration — same runtime, same quantization, same context length — before committing to a production pattern.

How Liquid Compares

Model	Total params	Active per token	Strongest fit
Liquid LFM2.5-8B-A1B	8 billion	Roughly 1 billion	On-device inference; open-weight with no use restrictions
Google Gemma 3 (4B / 9B variants)	4-9 billion	Same as total	On-device with Google-tooling integration
Microsoft Phi-4	14 billion	Same as total	On-device with strong reasoning per parameter
Apple on-device Foundation Models	3 billion	Same as total	Apple-only; tight OS integration
Mistral Ministral (3B / 8B)	3-8 billion	Same as total	EU-aligned edge deployment

Liquid's differentiation is the MoE-at-the-edge design choice — smaller active-parameter count for the same memory footprint as competitors of similar total size, and the no-use-restrictions open-weight license that contrasts with Llama's Meta-imposed terms and Gemma's Google-aligned ones.

Beyond the LFM Line

Liquid sells two products built on top of the LFM checkpoints, mostly aimed at enterprise customers who need more than the open-weight download alone:

Liquid LEAP — customization and deployment platform. Enterprise teams fine-tune LFM checkpoints against private data, validate, package the artifacts, and ship them to the customer's devices. Comparable to Hugging Face Inference Endpoints crossed with an on-device deployment toolchain.
Liquid Apollo — consumer-facing on-device AI application. The headline use case is hands-on exploration: install Apollo, run an LFM locally, and validate that on-device inference at this parameter band actually feels usable before committing to a deployment.

Pricing

Plan	Price	Features
Open Weights	Free	Download LFM2.5-8B-A1B from Hugging Face No use restrictions Self-host, fine-tune, redistribute derivatives
LEAP	Contact sales	Enterprise fine-tuning and deployment platform Custom on-device packaging Private-data fine-tuning support
Liquid Apollo	Consumer app	On-device LFM runtime Hands-on exploration before deployment commitment

Open WeightsFree

Download LFM2.5-8B-A1B from Hugging Face
No use restrictions
Self-host, fine-tune, redistribute derivatives

LEAPContact sales

Enterprise fine-tuning and deployment platform
Custom on-device packaging
Private-data fine-tuning support

Liquid ApolloConsumer app

On-device LFM runtime
Hands-on exploration before deployment commitment

The open-weight tier is the centerpiece — Liquid's revenue model is the enterprise LEAP platform plus consulting and custom-model engagements, not per-token API fees on the base models. For most evaluators the right starting point is downloading LFM2.5-8B-A1B and testing it in your target runtime before any commercial conversation.

Strengths

On-device first: Architecture, runtime support, and quantization choices all optimized for laptop and phone inference rather than data-center batching
Open-weight with no use restrictions: Substantially more permissive than Llama (Meta acceptable use), Gemma (Google AUP), or even modified-MIT releases
Strong non-hallucination rate: 63.47% non-hallucination is in the band that actually lets you put on-device output in front of users
Day-one ecosystem support: llama.cpp, MLX, vLLM, SGLang, and ONNX from launch — no waiting for community ports
MIT CSAIL pedigree: Founders include Daniela Rus (CSAIL director), grounding the lab in academic depth rather than pure product engineering
Credible alternative to hyperscaler inference cost: Removes the largest line item for AI-product companies whose workloads don't require absolute frontier capability

Limitations & Considerations

Not a frontier-capability model: Claude Opus 4.8, GPT-5.5, and Gemini 3.5 Pro maintain large leads at the capability ceiling; LFM2.5-8B-A1B is the on-device deployment alternative, not the flagship competitor
Smaller community than Llama or Mistral: Fewer English-language tutorials, integration guides, and third-party tooling than the largest open-weight communities — though day-one major-runtime support partially compensates
No hosted API: If you want a managed cloud endpoint without self-hosting, you'll need to go through a third-party host that picks up the open weights — Liquid does not sell hosted inference
Active-parameter MoE is harder to reason about than dense: Performance characteristics on bespoke runtimes (custom kernels, FPGA, NPUs) require empirical validation rather than scaling-law extrapolation from dense models
Hardware variability matters: Reported tokens-per-second numbers are device-flagship — older or budget hardware will land well below the headline figures

Best Use Cases

Task	Why LFM2.5-8B-A1B
Keyboard agents and on-device autocomplete	Sub-second latency on phones; no per-token cloud fee
Voice assistants embedded in products	Local inference avoids round-trip latency to a cloud endpoint
Industrial-controller and embedded copilots	Open weights deployable on constrained hardware without phoning home
Data-residency-constrained workloads	Nothing leaves the device; passes the strictest data-sovereignty audits
Cost-sensitive AI-product companies	Removes the largest cost line for applications below the frontier capability bar
Edge agents for in-car, in-factory, in-field deployment	Open-weight + day-one runtime support fits embedded-engineering workflows

When to choose alternatives:

Frontier capability ceiling for hosted use → Claude Opus 4.8, GPT-5.5, Gemini 3.5 Pro
Strongest open-weight community and tooling → Mistral Large 3 or Llama 4 derivatives
Tightest Apple-platform integration → Apple on-device Foundation Models
Hosted inference with no self-hosting overhead → any flagship-mini hosted endpoint

Getting Started

Download the weights from Hugging Face — search for LiquidAI/LFM2.5-8B-A1B (base) or the post-trained variant
Pick a runtime — llama.cpp for CPU + GGUF deployments, MLX for Apple Silicon, vLLM or SGLang for batched server-side inference, ONNX for cross-platform mobile / embedded
Validate on your target device with a representative prompt set — vendor-reported tokens-per-second numbers are device-flagship; verify your hardware before committing
For private-data fine-tuning without rolling your own pipeline, contact Liquid about the LEAP platform
For hands-on consumer exploration, install Liquid Apollo and run the model on your phone before committing to an embedded deployment

Key Takeaways

Liquid LFM is the family of open-weight Liquid Foundation Models from Cambridge-based MIT CSAIL spinoff Liquid AI, engineered for on-device inference rather than data-center batching
The current flagship LFM2.5-8B-A1B is an 8 billion-parameter mixture-of-experts model with roughly 1 billion active parameters per token, pretrained on 38 trillion tokens, with a 128,000-token context window
The model ships open-weight with no use restrictions, plus day-one runtime support across llama.cpp, MLX, vLLM, SGLang, and ONNX — closing the historical integration gap for open-weight on-device deployment
Reported throughput is roughly 253 tokens per second on an Apple M5 Max, 146 tokens per second on a Ryzen AI Max Plus, and around 30 tokens per second on flagship smartphones — credible local inference in the same band as cloud-hosted flagship-mini tiers
Best suited for keyboard agents, voice assistants, embedded copilots, data-residency-constrained workloads, and any AI-product application where hyperscaler inference cost is the binding constraint
Liquid also sells the LEAP customization and deployment platform for enterprise teams and the Liquid Apollo consumer app for hands-on exploration; revenue model is enterprise-platform, not per-token API fees
Sits alongside Apple on-device Foundation Models, Microsoft Phi, Google Gemma, and Mistral Ministral as one of the labs building explicitly for the on-device constraint — Liquid's differentiation is the MoE-at-the-edge design and the no-use-restrictions license

Liquid LFM (Liquid AI)

Audio & video lessons are paid features