Launching Q3 2026 — Join the waitlist

Your Intelligence,
Truly Yours.

A complete local AI workspace for Apple Silicon. 22 agents, visual Teams, 100+ models, your own OpenAI-compatible server. No cloud. No subscription. No leaks.

Be among the first 200 to lock in $29 Early Bird instead of $49.

AiLocally main chat view with sidebar, active agent, and live system stats

Bug Hunter ✦ SQL Wizard ✦ Translator ✦ Code Reviewer ✦ SEO Optimizer ✦ Email Writer ✦ Writing Assistant ✦ Researcher ✦ Paper Reader ✦ Meeting Notes ✦ Data Analyst ✦ Twitter Threader ✦ LinkedIn Strategist ✦ Market Researcher ✦ Privacy Auditor ✦ Refactoring Coach ✦ Doc Writer ✦ Image Captioner ✦ Project Inspector ✦ Agent Architect ✦ Bug Hunter ✦ SQL Wizard ✦ Translator ✦ Code Reviewer ✦ SEO Optimizer ✦ Email Writer ✦ Writing Assistant ✦ Researcher ✦ Paper Reader ✦ Meeting Notes ✦ Data Analyst ✦ Twitter Threader ✦ LinkedIn Strategist ✦ Market Researcher ✦ Privacy Auditor ✦ Refactoring Coach ✦ Doc Writer ✦ Image Captioner ✦ Project Inspector ✦ Agent Architect ✦

Built-in agents

Ready to install

100+

MLX-quantized models

HuggingFace browser

UI languages

Native localized

Data uploaded

Ever

Six surfaces.
One quiet promise.

Scroll through. Every pixel runs on your Mac.

Chat

Talk to AI without uploading a thing.

A clean native chat with conversation history, active-agent indicator, and instant model switching. Every word stays on your Mac.

AiLocally chat interface with conversation list and active agent indicator

Catalog

100+ models. One click each.

Browse local quantized models or search HuggingFace Hub directly. Download Llama, Qwen, Gemma, Mistral, Phi — all MLX-optimized for Apple Silicon.

AiLocally model catalog showing HuggingFace search with MLX-quantized models

Agents

22 agents pre-installed. Yours to extend.

Each agent is a curated prompt + tools + memory + ideal-model preset. Tuned for the task. Editable. Composable into Teams.

AiLocally agents library modal with 22 predefined agents

Teams

Wire your AI workforce.

Drag agents onto a canvas. Connect their outputs. Run the whole graph with one click. Built-in templates for content, code review, research.

AiLocally Teams canvas with connected agents forming a content pipeline

Memory

Persistent. Categorized. Yours.

Long-term memory stored as Preferences, Facts, Decisions, or Notes. Per-agent. Searchable. Never synced.

AiLocally memory tab with Preference / Fact / Decision / Note filters

Model Catalog

Hundreds of models.
One click each.

Browse your locally-stored models or search the HuggingFace Hub directly inside the app. Every model is MLX-quantized for Apple Silicon — download, click Run, done. No terminals, no python, no waiting on conversions.

AiLocally model catalog with HuggingFace search and MLX-quantized models

MLX-native, every time.

We index thousands of MLX-quantized weights. No conversion step. No CUDA. Just download-and-go on M-series silicon.

Llama 3.3 70B Qwen 3.5 27B Devstral Small 24B Gemma 3 4B Kimi K2.6 gpt-oss 20B Phi-4 14B Mistral 7B Kokoro 82M parakeet-tdt 0.6B +90 more

Resource-aware

Auto-unload after N minutes of inactivity. The model evicts itself from unified memory so the rest of your Mac stays snappy. Configurable per-session.

22 Agents

A full library, ready
the moment you open the app.

Curated prompts, tool wiring, and ideal-model presets — all editable in plain Markdown. Open the library, click Install, start working. No prompt engineering required.

AiLocally agents library modal showing 22 installable predefined agents

Bug Hunter

Engineering

Reads code looking for subtle bugs, race conditions, edge cases.

code review quality

SQL Wizard

Engineering

Writes optimized SQL queries and explains execution plans.

sql database query

Project Inspector

Engineering

Explores project structure, reads configs, diagnoses issues.

inspect diagnose explore

Refactoring Coach

Engineering

Proposes targeted refactorings without over-engineering.

refactor clean-code patterns

Code Reviewer

Engineering

PR-grade review with security focus and style consistency.

review security style

Doc Writer

Engineering

Generates API docs and inline comments from signatures.

docs api

Translator

Language

Translates between languages preserving tone and context.

translation i18n

Summarizer

Language

Condenses long text into actionable bullet points.

summary tldr

Writing Assistant

Language

Tone shifts, grammar, clarity. Your voice, sharpened.

writing editing

Email Writer

Language

Drafts professional emails in the right tone.

email communication

SEO Optimizer

Marketing

Audits and rewrites web content for search intent.

seo marketing content

LinkedIn Strategist

Marketing

Crafts posts tuned for LinkedIn engagement.

linkedin social

Twitter Threader

Marketing

Writes viral-shaped threads in X/Twitter format.

twitter x threads

Market Researcher

Marketing

Investigates markets, competitors, trends.

research market

Researcher

Knowledge

Synthesizes across multiple documents. Cites sources.

research synthesis

Paper Reader

Knowledge

Reads academic papers and extracts the contributions.

paper academic

Meeting Notes

Knowledge

Transcripts in — decisions + action items + speakers out.

notes meetings

Data Analyst

Knowledge

Drop a CSV. Get charts, anomalies, plain-English insights.

data analysis

Image Captioner

Vision

Vision-model alt text and image descriptions.

vision alt-text

Privacy Auditor

Security

Scans text for PII before you paste it anywhere.

privacy pii

Email Triage

Productivity

Sorts inbox by urgency and proposes 1-line responses.

email triage

Agent Architect

Wire your AI workforce.
Run it with one click.

Drag agents onto a canvas. Connect their outputs. Save the team. Run the whole graph whenever you need that output. No scripting unless you want to.

Content Pipeline

Researcher → Writer → Editor → SEO → Publisher

Code Review

Reviewer → Bug Hunter → Doc Writer

Inbox Triage

Email Triage → Email Writer → Privacy Auditor

Research Brief

Researcher × N → Synthesizer → Writer

Social Strategy

Market Researcher → LinkedIn → Twitter → Scheduler

Paper Digest

Paper Reader → Summarizer → Doc Writer

Templates ship with the app. Customize them, save your own, or share JSON with a teammate.

Memory

Long-term memory,
not a sliding window.

Things you tell the model should stick — but stay organized. AiLocally splits memory into four categories so the right context surfaces at the right moment. All on-device, searchable, and yours to wipe with one click.

Preference

How you like answers framed. Tone. Verbosity. Naming conventions.

Fact

Stable truths about you, your projects, your stack, your codebase.

Decision

Choices already made that future answers should respect.

Note

Anything else worth surfacing later. Free-form context.

Performance

Built native.
Faster than the API.

Written in Swift. Powered by MLX. Apple Silicon's unified memory architecture means models never copy weights between CPU and GPU — every token is generated at maximum bandwidth.

1.5×

faster than llama.cpp

on MLX-native models

~50ms

time to first token

no network round-trip

0 KB

data uploaded

measured with nettop

Tokens per second — generation speed

Higher is better · context 2k · prompt eval excluded

AiLocally on M4 Max

Llama 3.3 70B · Q4 · MLX-native

32 tok/s

llama.cpp on M4 Max

Same model · Q4 · Metal backend

21 tok/s

AiLocally on M3 Pro

Llama 3.3 70B · Q4 · MLX-native

19 tok/s

AiLocally on M2

Llama 3.2 3B · Q4 · MLX-native

58 tok/s

Benchmarks on macOS 15.4, 64GB unified memory, room-temperature ambient. Models loaded from local SSD. Results vary by ±10% per run.

100% LOCAL

Your data
never leaves your Mac.

Cloud AI promised convenience. It delivered surveillance, lock-in, and rented intelligence. We took the other path.

Activity Monitor — Network

Filtered: AiLocally

$ nettop -p AiLocally

→ Bytes in: 0 KB

→ Bytes out: 0 KB

→ Established connections: 0

✓ Verified offline.

No data ever leaves your Mac

Every prompt, every conversation, every memory is processed on-device. We have no servers to leak.

No telemetry, no analytics

Open Activity Monitor while AiLocally runs. You will see zero outbound calls. Verifiable.

No account, no email required

You install the app. You use it. That is the entire data flow.

Read the privacy whitepaper

Developer API

Your private OpenAI,
on `localhost:8080`.

Same endpoints. Same request format. Same streaming semantics as api.openai.com. Flip the toggle in Settings and every tool that speaks OpenAI just works — pointed at your Mac.

AiLocally settings showing server toggle on port 8080 with theme and language options

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-instruct",
    "messages": [
      {"role": "user", "content": "Summarize this PR diff."}
    ],
    "stream": true
  }'

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"  # local, no auth
)

stream = client.chat.completions.create(
    model="llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "Summarize this PR diff."}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "not-needed",
});

const stream = await client.chat.completions.create({
  model: "llama-3.3-70b-instruct",
  messages: [{ role: "user", content: "Summarize this PR diff." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

let url = URL(string: "http://localhost:8080/v1/chat/completions")!
var request = URLRequest(url: url)
request.httpMethod = "POST"
request.setValue("application/json", forHTTPHeaderField: "Content-Type")
request.httpBody = try JSONEncoder().encode([
    "model": "llama-3.3-70b-instruct",
    "messages": [["role": "user", "content": "Summarize this PR diff."]],
    "stream": true
])

let (bytes, _) = try await URLSession.shared.bytes(for: request)
for try await line in bytes.lines {
    print(line)
}

Drop-in replacement — works with everything that speaks OpenAI

Cursor IDE Continue.dev IDE Aider CLI Open WebUI UI LangChain SDK LlamaIndex SDK Zed IDE CrewAI SDK

Built right.

The technical choices most people will never notice — and the ones the discerning ones will.

Native Swift core

No Electron. No wrappers. Written in Swift for Apple Silicon from the first line.

MLX framework

Apple's ML stack. Unified memory. No CPU-GPU copies. Hardware-rate token generation.

OpenAI-compatible API

localhost:8080/v1 — drop in for Cursor, Continue, Aider, LangChain, anything else.

Persistent memory

Long-term context split across Preference, Fact, Decision, Note. Searchable. Yours.

Live resource stats

CPU, RAM, GPU and active-model footprint visible in the sidebar at all times.

Auto-unload after N min

Inactive models evict themselves from unified memory. The rest of your Mac stays snappy.

8 UI languages

English · Español · 中文 · Deutsch · Français · العربية · Português · 日本語.

Sparkle auto-updates

Industry-standard Mac update framework. Signed releases, cryptographically verified.

Honest comparison

We're not the only option.
We're the right one for this job.

If you want a free CLI to run a model, Ollama is great. If you want a chat app and don't mind cloud, ChatGPT works. AiLocally is for the spot in between — local-first, opinionated, ready to use, one-time priced.

Feature	AiLocally	LM Studio	Ollama	ChatGPT Plus	Cloud API
100% local No prompt ever leaves your machine
Native Mac app Not Electron. Not a browser tab.		Partial		Partial
Built-in agents Curated, ready to use	12+			GPTs
Visual pipelines Chain agents in a graph
OpenAI-compatible API Cursor, Continue, Aider work out of box
MLX-native speed Unified-memory optimized		Partial		n/a	n/a
Conversation memory Persistent, per-agent
Pricing One-time vs subscription	$49 once	Free	Free	$20/mo	Pay-per-token

Comparison reflects feature scopes as of mid-2026. Competitors are great products — pick the one that fits your workflow.

One-time pricing.
Yours forever.

No subscription. No usage limits. Buy once, use forever — on every Mac you own.

Save $20

Early Bird

The same as Personal, but locked in at the launch discount.

$29

lifetime · first 200 only

1 user license
All built-in agents
All future updates included
OpenAI-compatible API
Sparkle auto-updates
Priority support

Claim Early Bird

Personal

Everything you need for personal & freelance work.

$49

lifetime · one-time

1 user license
All built-in agents
All future updates included
OpenAI-compatible API
Sparkle auto-updates
Email support

Join waitlist

Teams

For dev teams, agencies, and small studios.

$129

lifetime · 3 seats · one-time

3 user licenses
All built-in agents
All future updates included
OpenAI-compatible API
Shared pipelines (Phase 2)
Email support

Join waitlist

30-day money back guarantee · No questions asked · Pay with card, PayPal, or crypto

Questions, answered.

Did not see your question? Email hello@ai-locally.com .

Do I need an internet connection to use AiLocally?

Only twice: once to download the app and a model, and once during license activation. After that it works fully offline — including model loading, inference, agents, and pipelines. You can run it on a plane, in a tunnel, or with Little Snitch blocking every outbound request.

Which Macs are supported?

Apple Silicon only: M1, M2, M3, M4 and successors. 16GB RAM minimum for 7B–13B models. 32GB+ recommended for 70B-class quantized models. Intel Macs are not supported — MLX requires unified memory architecture.

How does this compare to LM Studio or Ollama?

They are great if you want a free engine to run models. AiLocally is a complete product on top of that: 12+ curated agents, visual pipelines, persistent memory, a native Mac UI, an OpenAI-compatible API server, and Sparkle auto-updates. The full comparison table is one section up.

Will Apple Intelligence make this obsolete?

Apple Intelligence is a different product. It is a sandboxed feature set inside Apple apps, with no API for developers, no way to load arbitrary open models, and no agent system. If Apple ever ships something equivalent, every Personal license includes a 90-day refund for the first version after that announcement — but we believe complementary tools will keep their place.

Can I use AiLocally commercially?

Yes. The Personal license is per developer — use it at your job, in side projects, with paying clients, no separate commercial fee. The Teams license is for shared use across multiple seats. We do not collect a revenue share or impose usage limits.

What models can I run?

Anything in HuggingFace that has an MLX-compatible quantization. The model browser inside the app suggests well-tested ones (Llama 3.x family, Qwen 2.5, Phi-4, Mistral, Gemma 2). You can also point AiLocally at a local GGUF or safetensors file you already have.

What if I am not technical?

You do not need to be. The agents are ready to use — pick one, type, get an answer. The Markdown agent editor is optional. The Pipeline canvas has templates you can run without editing.

Your Intelligence, Truly Yours.

Six surfaces. One quiet promise.

Talk to AI without uploading a thing.

100+ models. One click each.

22 agents pre-installed. Yours to extend.

Wire your AI workforce.

Persistent. Categorized. Yours.

Hundreds of models. One click each.

MLX-native, every time.

A full library, ready the moment you open the app.

Bug Hunter

SQL Wizard

Project Inspector

Refactoring Coach

Code Reviewer

Doc Writer

Translator

Summarizer

Writing Assistant

Email Writer

SEO Optimizer

LinkedIn Strategist

Twitter Threader

Market Researcher

Researcher

Paper Reader

Meeting Notes

Data Analyst

Image Captioner

Privacy Auditor

Email Triage

Agent Architect

Wire your AI workforce. Run it with one click.

Long-term memory, not a sliding window.

Built native. Faster than the API.

Tokens per second — generation speed

Your data never leaves your Mac.

No data ever leaves your Mac

No telemetry, no analytics

No account, no email required

Your private OpenAI, on localhost:8080.

Built right.

Native Swift core

MLX framework

OpenAI-compatible API

Persistent memory

Live resource stats

Auto-unload after N min

8 UI languages

Sparkle auto-updates

We're not the only option. We're the right one for this job.

One-time pricing. Yours forever.

Questions, answered.

Your Intelligence,
Truly Yours.

Six surfaces.
One quiet promise.

Hundreds of models.
One click each.

A full library, ready
the moment you open the app.

Wire your AI workforce.
Run it with one click.

Long-term memory,
not a sliding window.

Built native.
Faster than the API.

Your data
never leaves your Mac.

Your private OpenAI,
on `localhost:8080`.

We're not the only option.
We're the right one for this job.

One-time pricing.
Yours forever.