Free to read. Sign up to save your progress and take knowledge-check quizzes.

Sign up free
6 min read·Updated April 29, 2026

Core ML

Apple logoBy Apple

Core ML is Apple's on-device machine learning framework for deploying trained models across iPhone, iPad, Mac, Apple Watch, and Apple TV — supporting PyTorch and TensorFlow conversion via coremltools, automatically routing inference across CPU, GPU, and Neural Engine, with low-bit quantization for compact deployments.

Listen to this lesson

Free preview · first 0:30
0:00 / 0:30

Audio & video lessons are paid features

Plus unlocks audio streaming. Pro adds downloadable audio, video, certificates, and more.

Plus adds:
  • Audio streaming
  • Downloadable PDFs
  • All AI Playbooks
  • Personalized content
Pro also adds:
  • Certificates of completion
  • Audio MP3 downloads
  • Video lessonssoon
  • & More…soon

Watch this lesson

Video coming soon

Learning Objectives

  • Understand Core ML's role in Apple's on-device AI deployment workflow
  • Identify how PyTorch and TensorFlow models are converted via coremltools
  • Evaluate when Core ML fits an iOS / macOS development workflow

What Is Core ML?

Core ML is Apple's on-device machine learning framework — the deployment runtime that lets developers ship trained AI models inside iOS, iPadOS, macOS, watchOS, and tvOS apps. It handles the inference pipeline across CPU, GPU, and Neural Engine automatically, optimizing for power efficiency and memory footprint.

For developers building AI features into Apple-platform apps, Core ML is the typical deployment path. Models trained in PyTorch or TensorFlow are converted to Core ML format (.mlpackage) using coremltools — Apple's open-source unified conversion tool — and then deployed to apps where they run locally with no network required.

💡Key Concept

Core ML vs Apple Intelligence vs MLX: Apple Intelligence is the user-facing AI system (Writing Tools, Genmoji, Siri). Core ML is the developer-facing framework for deploying any custom model on-device — used for everything from custom image classifiers to embedded LLM features inside third-party apps. MLX is the newer Apple Silicon-specific framework for training and inference on Mac (especially LLMs). Different tools for different jobs: Core ML for shipping deployed models in apps; MLX for training and running models locally on Mac.

Tip

Visit Core ML: developer.apple.com/machine-learning/core-ml — included free with Apple Developer Program; coremltools open source on GitHub

Pricing & Access

Core ML is included free with the Apple Developer Program.

Apple Developer Program$99/year
  • Includes Core ML framework
  • App distribution + TestFlight
  • Required for App Store distribution
coremltools (open source)Free
  • Convert PyTorch / TensorFlow models
  • Validation + editing tools
  • Apache-licensed
Xcode + Create MLFree
  • Bundled with Apple development tools
  • Visual model training (Create ML)
  • Required for app development
On-Device InferenceNo per-call cost
  • Runs on user's device
  • No cloud bills
  • Power and memory budgets apply
Apple Intelligence Foundation ModelsFree for Apple Developer Program members
  • Use Apple Intelligence inside apps
  • No per-call costs
  • Different framework from Core ML

Core ML's economics are straightforward: no per-inference cost because models run on user devices. The only costs are Apple Developer Program membership and the engineering effort to ship the app.

Core Capabilities

PyTorch and TensorFlow Conversion via coremltools

The standard deployment workflow. Train your model in PyTorch or TensorFlow using your normal ML stack, then use the coremltools Python package to convert to Core ML's .mlpackage format. The converted model is dropped into the Xcode project and used via Swift APIs.

Hybrid Execution Plan (CPU + GPU + Neural Engine)

Core ML automatically generates a hybrid execution plan spanning CPU, GPU, and Apple Neural Engine (ANE) — selecting the optimal compute resource per layer of the model. Developers don't manually specify which engine to use; Core ML picks based on what's available and what's fastest.

Neural Engine Acceleration

Modern Apple Silicon includes a dedicated Neural Engine (ANE) — purpose-built hardware for AI inference that's substantially faster and more power-efficient than CPU or GPU for ML workloads. The latest Neural Engines deliver tens of TOPS (trillion operations per second) at fractions of a watt.

Low-Bit Quantization (macOS Sequoia and Beyond)

macOS Sequoia introduced multiple low-bit quantization methods supported by Core ML:

  • 4-bit block-wise linear quantization — substantial memory + compute reduction
  • Channel group-wise palettization — alternative compression technique
  • Greatly reduces memory footprint and improves latency on the Neural Engine

Critical for shipping LLM-class models on-device — even small LLMs (1-3B parameters) benefit dramatically from 4-bit quantization for iPhone/iPad deployment.

On-Device Llama 3.1 (Apple ML Research Demo)

Apple's machine learning research has published guides for deploying Llama 3.1 on-device using Core ML — demonstrating that frontier-class open-source LLMs can run locally on iPhone with the right quantization. This is a meaningful capability statement: developers can ship AI features that don't require cloud infrastructure.

Create ML Visual Training

For developers without ML background, Create ML (bundled with Xcode) provides a visual model-training interface — image classification, object detection, sentiment analysis, custom recommendations — that produces Core ML-ready models without writing PyTorch code.

Updateable Models

Core ML supports on-device model updates — letting models be fine-tuned with user-specific data on the device itself, preserving privacy. Useful for personalization (custom recommendation models, user-specific fine-tunes) without cloud round-trips.

Multimodal Inputs

Core ML natively handles image, audio, video, and tabular data inputs — automatically managing preprocessing (resizing images, normalizing audio) so the developer's app code stays clean.

Strengths

  • Free for Apple Developer Program members: No per-inference cost; developers pay only the $99/year membership
  • Automatic hybrid execution: CPU + GPU + Neural Engine routing handled by the framework
  • Standard PyTorch / TensorFlow conversion: Use existing ML training stack; convert at deployment
  • Low-bit quantization: Ship LLM-class models on-device with 4-bit compression
  • Privacy: All inference happens on-device by default
  • Hundreds of millions of devices: Distribution scale through Apple's installed base
  • Mature ecosystem: Core ML ships in iOS for years; substantial documentation and community

Limitations & Considerations

  • Apple ecosystem only: Core ML doesn't run on Android / Windows / Linux — not a cross-platform framework
  • Conversion overhead: Some PyTorch / TensorFlow operations don't have direct Core ML equivalents — requires manual ops or graph rewriting
  • Memory budgets matter: Even with 4-bit quantization, larger LLMs strain iPhone memory; deployment requires careful sizing
  • Power consumption: Continuous inference can affect battery life; design for intermittent rather than always-on AI features
  • Less flexible than full PyTorch / TensorFlow: Core ML is optimized for inference, not training — for on-device training, MLX or PyTorch's mobile runtimes serve better
  • Apple Intelligence is separate: Apple Intelligence's Foundation Models framework is different from Core ML — different APIs, different deployment model

Best Use Cases

Use CaseWhy Core ML FitsCaveat
Custom model deployment in iOS / macOS appsStandard conversion workflow + automatic device routingApple ecosystem only
On-device LLM features4-bit quantization makes small LLMs deployableMemory and battery constraints
Image classification / object detection in appsCore ML pipeline tuned for visionDeploy size matters for App Store
Privacy-sensitive AI featuresAll inference on-deviceLess flexible than cloud APIs
Personalization with on-device fine-tuningUpdateable models without cloud round-tripLimited training capability vs full PyTorch

When to choose alternatives:

  • Cross-platform deployment → ONNX Runtime, TensorFlow Lite, PyTorch Mobile, MLC LLM
  • Heavy LLM workloads on Mac → MLX offers training + larger model support
  • Apple Intelligence features → use Apple Intelligence Foundation Models framework (different from Core ML)
  • Cloud AI APIs → OpenAI / Anthropic / Google for frontier-quality language models
  • Production training → use full PyTorch or TensorFlow in Python, deploy to Core ML

Key Takeaways

  • Core ML is Apple's on-device machine learning framework for deploying trained models across iPhone, iPad, Mac, Apple Watch, and Apple TV — included free with Apple Developer Program membership
  • Standard workflow: train in PyTorch or TensorFlow, convert to .mlpackage via coremltools, deploy in Xcode app
  • Core ML automatically routes inference across CPU, GPU, and Neural Engine for optimal performance and power efficiency
  • macOS Sequoia introduced 4-bit block-wise linear quantization and channel group-wise palettization — enabling LLM-class models on-device with reduced memory footprint and improved Neural Engine latency
  • Best fit for shipping AI features inside iOS / macOS apps with privacy and no per-inference cost; for cross-platform deployment use ONNX Runtime or TensorFlow Lite; for LLM workloads on Mac, MLX is the more capable companion

Save your progress & take the quiz

Sign up free to bookmark lessons, track which modules you've completed, and lock in what you learned with a quick knowledge-check quiz at the end of each lesson.

🧭Recommended for you