MinishLab AI Tools & Models

Learn About MinishLab's AI Products

Create a free account to access in-depth lessons on each tool and model.

📋About MinishLab

Updated August 1, 2026

MinishLab (branded "Minish") is a two-person non-profit open-source lab focused on natural language processing, founded in 2024 by Thomas van Dongen and Stéphan Tulkens. The lab's tagline is "Solving big problems with small models," and its core philosophy is that "if you make models fast enough, you unlock new possibilities" — embedding the entirety of English Wikipedia in roughly five minutes, classifying tens of thousands of documents per second on CPU, and deduplicating large datasets in minutes are all benchmark targets the lab routinely meets.

MinishLab's published work centers on a family of "potion" static embedding models — including potion-base-2M, potion-base-4M, potion-base-8M, potion-multilingual-128M, potion-retrieval-32M, and potion-code-16M — which power most of the lab's downstream tools. The potion-base-8M and potion-base-4M models alone have crossed roughly 700,000 downloads each on Hugging Face, and the full MinishLab catalog has surpassed 4 million combined package downloads. Across GitHub, the lab's repositories have accumulated over 5,500 stars.

The lab's product portfolio sits in the agentic-developer tooling space: Semble (code search optimized for AI agents that uses roughly 98% fewer tokens than grep-plus-read pipelines), Model2Vec (the family of static embedding models that achieve state-of-the-art speed at a fraction of the compute cost of sentence-transformers), SemHash (multimodal semantic deduplication and dataset filtering), Vicinity (a unified interface across approximate-nearest-neighbor backends), and Tokenlearn (a method for pre-training static word embeddings). Tools and models are released under permissive open-source licenses, and the lab funds itself through community sponsorships and grants rather than traditional venture capital — a deliberate choice that keeps the research direction focused on speed and accessibility rather than commercial scale.

🛠️Products & Tools (1)

SembleOpen SourceAI Coding

Open-source code search library purpose-built for AI coding agents — 98% fewer tokens than grep-plus-read at higher recall, with sub-2-millisecond query latency.

View

MinishLab

Audio & video lessons are paid features

📋About MinishLab

🛠️Products & Tools (1)

📰MinishLab in the News