Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
5 min read·Updated March 29, 2026

autoresearch

By Eureka Labs

autoresearch is Andrej Karpathy's open-source AI agent that autonomously runs ML experiments on a single GPU — modifying training code, keeping improvements, and discarding failures at a rate of approximately 100 experiments overnight.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand how autoresearch automates machine learning experimentation using AI agents
  • Evaluate the tool's requirements, limitations, and real-world results
  • Compare autoresearch to traditional AutoML and other automated experimentation approaches

What Is autoresearch?

autoresearch is an open-source tool created by Andrej Karpathy — the former Director of AI at Tesla and a founding member of OpenAI. Released on March 7, 2026, it lets an AI agent autonomously run machine learning experiments on a single GPU. The agent modifies training code, runs a 5-minute experiment, checks if the result improved, keeps the change if it did (or discards it), and repeats.

The result: approximately 12 experiments per hour and roughly 100 experiments overnight — all without human intervention. The entire project is a single 630-line Python script released under the permissive MIT License.

Within weeks of release, autoresearch gained 60,000+ GitHub stars and 8,400+ forks, making it one of the fastest-growing open-source AI projects of 2026.

Tip

Get started: Clone the repository at github.com/karpathy/autoresearch. Requires a single NVIDIA GPU and Python.

How It Works

The design is deliberately minimal — three files, one GPU, one metric:

FilePurpose
prepare.pyData preparation — run once; never touched by the agent
train.pyThe single file the AI agent edits freely during experiments
program.mdMarkdown instructions written by the human researcher to guide the agent's behavior

Each experiment follows a strict loop:

  1. The AI agent reads the current train.py and program.md
  2. It proposes a modification to train.py (new architecture tweak, hyperparameter change, or optimization)
  3. Training runs for exactly 5 minutes regardless of hardware
  4. If the result improves, the change is kept; otherwise it is discarded
  5. The loop repeats

The 5-minute fixed training window is a key design choice — it normalizes experiments across different GPUs and keeps the feedback loop tight.

What Makes It Different from AutoML?

Traditional AutoML and Neural Architecture Search (NAS) frameworks search over architectures or hyperparameters using structured algorithms. They are precise but constrained to their predefined search space.

autoresearch is fundamentally different: the AI agent can read research papers, develop hypotheses, and try creative improvements that no predefined search space would include. The search space is defined by what the LLM can think of — not by what a human engineer enumerated in advance.

ApproachSearch SpaceCreativitySetup
AutoML / NASPredefined hyperparameters and architecturesLow — only explores what humans definedComplex configuration
autoresearchAnything the LLM can proposeHigh — can read papers and develop novel hypotheses630 lines of Python; one file

Real-World Results

Karpathy pointed autoresearch at nanochat, his already well-optimized GPT-2 training codebase. Over two days, the agent ran approximately 700 experiments and found around 20 genuine improvements to an already-optimized baseline.

Shopify CEO Tobi Lutke applied it to Shopify's templating engine and reported 53% faster rendering from 93 automated commits — demonstrating that the approach works beyond ML training code.

Requirements

RequirementDetails
GPUSingle NVIDIA GPU (any recent model)
SoftwarePython; CUDA toolkit
LLM APIRequires access to an LLM API (for the agent's reasoning)
LicenseMIT — permissive; enterprise-friendly
Codebase size630 lines of Python
GitHub Stars60,000+ (as of March 2026)

Strengths

  • Radically simple — 630 lines, three files, one GPU; no complex setup or configuration
  • Autonomous overnight experimentation — run 100+ experiments while you sleep
  • Creative search space — the LLM agent can propose improvements no predefined search would include
  • Fixed 5-minute training window — normalizes experiments across hardware; keeps feedback loop tight
  • MIT licensed — free for commercial and personal use
  • Proven results — 20 improvements found on an already-optimized codebase; 53% speedup at Shopify
  • Massive community — 60,000+ GitHub stars and active development within weeks of release

Limitations and Considerations

  • Single GPU only — does not currently support multi-GPU or distributed training setups
  • Requires an LLM API — the agent needs access to a large language model for reasoning, which adds API cost
  • ML-focused — designed for machine learning training code; applying it to other domains requires adaptation
  • 5-minute window — some experiments may need longer training to show meaningful results
  • No guarantees — the agent can propose changes that appear to improve metrics but may not generalize; human review of accepted changes is recommended
  • Early stage — released March 2026; the tool and community are still maturing

Company Details

DetailInfo
CreatorAndrej Karpathy
CompanyEureka Labs (AI-native education startup)
ReleasedMarch 7, 2026
LicenseMIT (open-source)
GitHubgithub.com/karpathy/autoresearch
Stars60,000+ (March 2026)
PricingFree (open-source); LLM API costs apply
Websiteeurekalabs.ai

Key Takeaways

  • autoresearch is Andrej Karpathy's open-source tool that lets an AI agent autonomously run approximately 100 ML experiments overnight on a single GPU — modifying code, testing, and keeping only improvements
  • Unlike traditional AutoML, the agent can read research papers and propose creative improvements beyond any predefined search space
  • Proven at scale: 20 improvements found on Karpathy's already-optimized codebase, and 53% rendering speedup at Shopify from 93 automated commits
  • 630 lines of Python, MIT licensed, 60,000+ GitHub stars — one of the fastest-growing open-source AI projects of 2026

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you