How to set up Ollama to run AI models locally and integrate them into your development workflow alongside Claude Code.
What is Ollama?
Ollama is an open-source tool that lets you run large language models locally on your own machine — no API keys, no internet connection, no usage costs. It wraps models like llama3, deepseek-coder, qwen2.5-coder and many others into a simple CLI and a local HTTP server.
Think of it as Docker, but for AI models.
What is Claude Code?
Claude Code is Anthropic’s official CLI tool for agentic coding. It connects to Claude’s API and can read your codebase, write files, run commands and work autonomously on complex tasks directly from your terminal.
Why use both?
Claude Code and Ollama serve different roles:
- Claude Code → best for complex reasoning, architecture decisions, multi-file refactors and tasks that need the full power of Claude Sonnet or Opus
- Ollama → best for quick, offline completions, privacy-sensitive codebases, or when you want zero latency without API costs
Using both gives you flexibility: reach for Claude Code when you need the best results, and Ollama when you’re offline, iterating fast, or working with sensitive code that can’t leave your machine.
Installing Ollama
# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh
# Or download the macOS app from https://ollama.com/download
Verify it’s running:
ollama --version
Pulling a coding model
# Great for code — fast and capable
ollama pull qwen2.5-coder:7b
# Larger, more powerful option
ollama pull deepseek-coder-v2:16b
# General purpose with strong code support
ollama pull llama3.1:8b
List your installed models:
ollama list
Running a model in the terminal
# Interactive chat
ollama run qwen2.5-coder:7b
# One-shot prompt
ollama run qwen2.5-coder:7b "Explain this function: $(cat src/utils/format.ts)"
Using Ollama’s API
Ollama exposes an OpenAI-compatible REST API on http://localhost:11434. You can query it directly:
curl http://localhost:11434/api/generate -d '{
"model": "qwen2.5-coder:7b",
"prompt": "Write a TypeScript function to debounce a callback",
"stream": false
}'
Or use the OpenAI-compatible endpoint:
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-coder:7b",
"messages": [{ "role": "user", "content": "Explain useEffect in React" }]
}'
Integrating with VS Code via Continue
Continue is a VS Code and JetBrains extension that turns any local model into an in-editor AI assistant. It works natively with Ollama.
Install Continue from the VS Code marketplace, then configure it in .continue/config.json:
{
"models": [
{
"title": "Qwen 2.5 Coder (local)",
"provider": "ollama",
"model": "qwen2.5-coder:7b",
"apiBase": "http://localhost:11434"
}
],
"tabAutocompleteModel": {
"title": "Qwen autocomplete",
"provider": "ollama",
"model": "qwen2.5-coder:7b"
}
}
Now you get inline completions and chat inside VS Code — fully local, fully private.
My hybrid workflow
Quick questions / offline work → Ollama (Continue in VS Code)
Complex refactors / new features → Claude Code (Anthropic API)
Code review / architecture → Claude Code
Sensitive / private codebases → Ollama
The key insight: you don’t have to choose. Claude Code handles the heavy lifting and Ollama handles the day-to-day without burning tokens.
Useful Ollama commands
# Start the Ollama server manually
ollama serve
# See running models
ollama ps
# Remove a model
ollama rm deepseek-coder-v2:16b
# Pull a specific version
ollama pull qwen2.5-coder:14b
Conclusion
Running AI locally with Ollama is not a replacement for Claude Code — it’s a complement. Local models give you privacy, zero cost and instant responses for routine tasks. Claude Code gives you the reasoning power of frontier models when it actually matters.
Set up both, define when to use each, and you’ll have the best of both worlds in your dev workflow.
Resources: