Koda Docs

What is Koda?

Koda downloads and runs quantized LLMs on your machine. No cloud, no API keys, no Docker. It speaks the Ollama and OpenAI protocols, so any compatible client works out of the box.

Requirements

Python 3.12+
RAM: 4 GB minimum (8 GB+ recommended for 7B models)
Disk: varies by model — 2–5 GB per model
GPU: optional but recommended — CUDA (Linux/Windows) or Metal (Apple Silicon)

Install

macOS / Linux

curl -fsSL https://getkoda.dev/install.sh | bash

After install, reload your shell:

source ~/.bashrc   # or ~/.zshrc on zsh
koda version

Windows

irm https://getkoda.dev/install.ps1 | iex

Manual

git clone https://github.com/rjcuff/koda
cd koda
python3 -m venv .venv && source .venv/bin/activate
pip install -e .

GPU Support (optional)

CUDA (Linux / Windows)

CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python --no-cache-dir
pip install -e .

Apple Silicon (Metal)

CMAKE_ARGS="-DGGML_METAL=on" pip install llama-cpp-python --no-cache-dir
pip install -e .

CPU-only works fine if you skip the above — inference is just slower.

Quick Start

# 1. Download a model
koda pull llama3.2

# 2. Chat interactively
koda run llama3.2

# 3. Or start the API server
koda serve

Type /bye to exit an interactive session.

Getting Started