Learning Objectives
- Understand how autoresearch automates machine learning experimentation using AI agents
- Evaluate the tool's requirements, limitations, and real-world results
- Compare autoresearch to traditional AutoML and other automated experimentation approaches
What Is autoresearch?
autoresearch is an open-source tool created by Andrej Karpathy — the former Director of AI at Tesla and a founding member of OpenAI. Released on March 7, 2026, it lets an AI agent autonomously run machine learning experiments on a single GPU. The agent modifies training code, runs a 5-minute experiment, checks if the result improved, keeps the change if it did (or discards it), and repeats.
The result: approximately 12 experiments per hour and roughly 100 experiments overnight — all without human intervention. The entire project is a single 630-line Python script released under the permissive MIT License.
Within weeks of release, autoresearch gained 60,000+ GitHub stars and 8,400+ forks, making it one of the fastest-growing open-source AI projects of 2026.
✅Tip
Get started: Clone the repository at github.com/karpathy/autoresearch. Requires a single NVIDIA GPU and Python.
How It Works
The design is deliberately minimal — three files, one GPU, one metric:
| File | Purpose |
|---|---|
| prepare.py | Data preparation — run once; never touched by the agent |
| train.py | The single file the AI agent edits freely during experiments |
| program.md | Markdown instructions written by the human researcher to guide the agent's behavior |
Each experiment follows a strict loop:
- The AI agent reads the current
train.pyandprogram.md - It proposes a modification to
train.py(new architecture tweak, hyperparameter change, or optimization) - Training runs for exactly 5 minutes regardless of hardware
- If the result improves, the change is kept; otherwise it is discarded
- The loop repeats
The 5-minute fixed training window is a key design choice — it normalizes experiments across different GPUs and keeps the feedback loop tight.
What Makes It Different from AutoML?
Traditional AutoML and Neural Architecture Search (NAS) frameworks search over architectures or hyperparameters using structured algorithms. They are precise but constrained to their predefined search space.
autoresearch is fundamentally different: the AI agent can read research papers, develop hypotheses, and try creative improvements that no predefined search space would include. The search space is defined by what the LLM can think of — not by what a human engineer enumerated in advance.
| Approach | Search Space | Creativity | Setup |
|---|---|---|---|
| AutoML / NAS | Predefined hyperparameters and architectures | Low — only explores what humans defined | Complex configuration |
| autoresearch | Anything the LLM can propose | High — can read papers and develop novel hypotheses | 630 lines of Python; one file |
Real-World Results
Karpathy pointed autoresearch at nanochat, his already well-optimized GPT-2 training codebase. Over two days, the agent ran approximately 700 experiments and found around 20 genuine improvements to an already-optimized baseline.
Shopify CEO Tobi Lutke applied it to Shopify's templating engine and reported 53% faster rendering from 93 automated commits — demonstrating that the approach works beyond ML training code.
Requirements
| Requirement | Details |
|---|---|
| GPU | Single NVIDIA GPU (any recent model) |
| Software | Python; CUDA toolkit |
| LLM API | Requires access to an LLM API (for the agent's reasoning) |
| License | MIT — permissive; enterprise-friendly |
| Codebase size | 630 lines of Python |
| GitHub Stars | 60,000+ (as of March 2026) |
Strengths
- Radically simple — 630 lines, three files, one GPU; no complex setup or configuration
- Autonomous overnight experimentation — run 100+ experiments while you sleep
- Creative search space — the LLM agent can propose improvements no predefined search would include
- Fixed 5-minute training window — normalizes experiments across hardware; keeps feedback loop tight
- MIT licensed — free for commercial and personal use
- Proven results — 20 improvements found on an already-optimized codebase; 53% speedup at Shopify
- Massive community — 60,000+ GitHub stars and active development within weeks of release
Limitations and Considerations
- Single GPU only — does not currently support multi-GPU or distributed training setups
- Requires an LLM API — the agent needs access to a large language model for reasoning, which adds API cost
- ML-focused — designed for machine learning training code; applying it to other domains requires adaptation
- 5-minute window — some experiments may need longer training to show meaningful results
- No guarantees — the agent can propose changes that appear to improve metrics but may not generalize; human review of accepted changes is recommended
- Early stage — released March 2026; the tool and community are still maturing
Company Details
| Detail | Info |
|---|---|
| Creator | Andrej Karpathy |
| Company | Eureka Labs (AI-native education startup) |
| Released | March 7, 2026 |
| License | MIT (open-source) |
| GitHub | github.com/karpathy/autoresearch |
| Stars | 60,000+ (March 2026) |
| Pricing | Free (open-source); LLM API costs apply |
| Website | eurekalabs.ai |
Related Tools
- Claude Agent SDK — Anthropic's framework for building custom AI agents
- OpenAI Agents SDK — OpenAI's agent building toolkit
- CrewAI — role-based multi-agent framework
- Paperclip — org chart-based AI agent orchestration platform
Key Takeaways
- autoresearch is Andrej Karpathy's open-source tool that lets an AI agent autonomously run approximately 100 ML experiments overnight on a single GPU — modifying code, testing, and keeping only improvements
- Unlike traditional AutoML, the agent can read research papers and propose creative improvements beyond any predefined search space
- Proven at scale: 20 improvements found on Karpathy's already-optimized codebase, and 53% rendering speedup at Shopify from 93 automated commits
- 630 lines of Python, MIT licensed, 60,000+ GitHub stars — one of the fastest-growing open-source AI projects of 2026