Getting Started
Install Koda and run your first model in under a minute.
What is Koda?
Koda downloads and runs quantized LLMs on your machine. No cloud, no API keys, no Docker. It speaks the Ollama and OpenAI protocols, so any compatible client works out of the box.
Requirements
- Python 3.12+
- RAM: 4 GB minimum (8 GB+ recommended for 7B models)
- Disk: varies by model — 2–5 GB per model
- GPU: optional but recommended — CUDA (Linux/Windows) or Metal (Apple Silicon)
Install
macOS / Linux
curl -fsSL https://getkoda.dev/install.sh | bashAfter install, reload your shell:
source ~/.bashrc # or ~/.zshrc on zsh
koda versionWindows
irm https://getkoda.dev/install.ps1 | iexManual
git clone https://github.com/rjcuff/koda
cd koda
python3 -m venv .venv && source .venv/bin/activate
pip install -e .GPU Support (optional)
CUDA (Linux / Windows)
CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --no-cache-dir
pip install -e .Apple Silicon (Metal)
CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python --no-cache-dir
pip install -e .CPU-only works fine if you skip the above — inference is just slower.
Quick Start
# 1. Download a model
koda pull llama3.2
# 2. Chat interactively
koda run llama3.2
# 3. Or start the API server
koda serveType /bye to exit an interactive session.