Koda Docs

koda serve
# Listening on http://127.0.0.1:11434

The server implements both the Ollama and OpenAI API protocols, so you can point any compatible client at it without any code changes.

Ollama-Compatible Endpoints

List downloaded models

curl http://localhost:11434/api/tags

Text generation

curl http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.2", "prompt": "Why is the sky blue?"}'

Chat completion

curl http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.2", "messages": [{"role": "user", "content": "Hello"}]}'

List running models

curl http://localhost:11434/api/ps

Pull a model

curl http://localhost:11434/api/pull \
  -H "Content-Type: application/json" \
  -d '{"name": "llama3.2"}'

Delete a model

curl -X DELETE http://localhost:11434/api/delete \
  -H "Content-Type: application/json" \
  -d '{"name": "llama3.2"}'

OpenAI-Compatible Endpoints

Drop-in replacement for any client that supports a custom base URL.

Python client

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="koda"
)

response = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

List models

curl http://localhost:11434/v1/models

Chat completion

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.2", "messages": [{"role": "user", "content": "Hello"}]}'

Streaming

Streaming works on both protocols — set "stream": true in the request body.

API Server

Ollama-Compatible Endpoints

List downloaded models

Text generation

Chat completion

List running models

Pull a model

Delete a model

OpenAI-Compatible Endpoints

Python client

List models

Chat completion

Streaming

On this page