Learning Objectives

Understand how autoresearch automates machine learning experimentation using AI agents
Evaluate the tool's requirements, limitations, and real-world results
Compare autoresearch to traditional AutoML and other automated experimentation approaches

What Is autoresearch?

autoresearch is an open-source tool created by Andrej Karpathy — the former Director of AI at Tesla and a founding member of OpenAI. Released on March 7, 2026, it lets an AI agent autonomously run machine learning experiments on a single GPU. The agent modifies training code, runs a 5-minute experiment, checks if the result improved, keeps the change if it did (or discards it), and repeats.

The result: approximately 12 experiments per hour and roughly 100 experiments overnight — all without human intervention. The entire project is a single 630-line Python script released under the permissive MIT License.

Within weeks of release, autoresearch gained 60,000+ GitHub stars and 8,400+ forks, making it one of the fastest-growing open-source AI projects of 2026.

✅Tip

Get started: Clone the repository at github.com/karpathy/autoresearch. Requires a single NVIDIA GPU and Python.

How It Works

The design is deliberately minimal — three files, one GPU, one metric:

File	Purpose
prepare.py	Data preparation — run once; never touched by the agent
train.py	The single file the AI agent edits freely during experiments
program.md	Markdown instructions written by the human researcher to guide the agent's behavior

Each experiment follows a strict loop:

The AI agent reads the current train.py and program.md
It proposes a modification to train.py (new architecture tweak, hyperparameter change, or optimization)
Training runs for exactly 5 minutes regardless of hardware
If the result improves, the change is kept; otherwise it is discarded
The loop repeats

The 5-minute fixed training window is a key design choice — it normalizes experiments across different GPUs and keeps the feedback loop tight.

What Makes It Different from AutoML?

Traditional AutoML and Neural Architecture Search (NAS) frameworks search over architectures or hyperparameters using structured algorithms. They are precise but constrained to their predefined search space.

autoresearch is fundamentally different: the AI agent can read research papers, develop hypotheses, and try creative improvements that no predefined search space would include. The search space is defined by what the LLM can think of — not by what a human engineer enumerated in advance.

Approach	Search Space	Creativity	Setup
AutoML / NAS	Predefined hyperparameters and architectures	Low — only explores what humans defined	Complex configuration
autoresearch	Anything the LLM can propose	High — can read papers and develop novel hypotheses	630 lines of Python; one file

Real-World Results

Karpathy pointed autoresearch at nanochat, his already well-optimized GPT-2 training codebase. Over two days, the agent ran approximately 700 experiments and found around 20 genuine improvements to an already-optimized baseline.

Shopify CEO Tobi Lutke applied it to Shopify's templating engine and reported 53% faster rendering from 93 automated commits — demonstrating that the approach works beyond ML training code.

Requirements

Requirement	Details
GPU	Single NVIDIA GPU (any recent model)
Software	Python; CUDA toolkit
LLM API	Requires access to an LLM API (for the agent's reasoning)
License	MIT — permissive; enterprise-friendly
Codebase size	630 lines of Python
GitHub Stars	60,000+ (as of March 2026)

Strengths

Radically simple — 630 lines, three files, one GPU; no complex setup or configuration
Autonomous overnight experimentation — run 100+ experiments while you sleep
Creative search space — the LLM agent can propose improvements no predefined search would include
Fixed 5-minute training window — normalizes experiments across hardware; keeps feedback loop tight
MIT licensed — free for commercial and personal use
Proven results — 20 improvements found on an already-optimized codebase; 53% speedup at Shopify
Massive community — 60,000+ GitHub stars and active development within weeks of release

Limitations and Considerations

Single GPU only — does not currently support multi-GPU or distributed training setups
Requires an LLM API — the agent needs access to a large language model for reasoning, which adds API cost
ML-focused — designed for machine learning training code; applying it to other domains requires adaptation
5-minute window — some experiments may need longer training to show meaningful results
No guarantees — the agent can propose changes that appear to improve metrics but may not generalize; human review of accepted changes is recommended
Early stage — released March 2026; the tool and community are still maturing

Company Details

Detail	Info
Creator	Andrej Karpathy
Company	Eureka Labs (AI-native education startup)
Released	March 7, 2026
License	MIT (open-source)
GitHub	github.com/karpathy/autoresearch
Stars	60,000+ (March 2026)
Pricing	Free (open-source); LLM API costs apply
Website	eurekalabs.ai

Claude Agent SDK — Anthropic's framework for building custom AI agents
OpenAI Agents SDK — OpenAI's agent building toolkit
CrewAI — role-based multi-agent framework
Paperclip — org chart-based AI agent orchestration platform

Key Takeaways

autoresearch is Andrej Karpathy's open-source tool that lets an AI agent autonomously run approximately 100 ML experiments overnight on a single GPU — modifying code, testing, and keeping only improvements
Unlike traditional AutoML, the agent can read research papers and propose creative improvements beyond any predefined search space
Proven at scale: 20 improvements found on Karpathy's already-optimized codebase, and 53% rendering speedup at Shopify from 93 automated commits
630 lines of Python, MIT licensed, 60,000+ GitHub stars — one of the fastest-growing open-source AI projects of 2026

autoresearch

Audio & video lessons are paid features

Learning Objectives

What Is autoresearch?

How It Works

What Makes It Different from AutoML?

Real-World Results

Requirements

Strengths

Limitations and Considerations

Company Details

Key Takeaways

Save your progress & take the quiz

Audio & video lessons are paid features

Learning Objectives

What Is autoresearch?

How It Works

What Makes It Different from AutoML?

Real-World Results

Requirements

Strengths

Limitations and Considerations

Company Details

Related Tools

Key Takeaways

Save your progress & take the quiz