API Server
Koda's API server implements both Ollama and OpenAI protocols.
koda serve
# Listening on http://127.0.0.1:11434The server implements both the Ollama and OpenAI API protocols, so you can point any compatible client at it without any code changes.
Ollama-Compatible Endpoints
List downloaded models
curl http://localhost:11434/api/tagsText generation
curl http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{"model": "llama3.2", "prompt": "Why is the sky blue?"}'Chat completion
curl http://localhost:11434/api/chat \
-H "Content-Type: application/json" \
-d '{"model": "llama3.2", "messages": [{"role": "user", "content": "Hello"}]}'List running models
curl http://localhost:11434/api/psPull a model
curl http://localhost:11434/api/pull \
-H "Content-Type: application/json" \
-d '{"name": "llama3.2"}'Delete a model
curl -X DELETE http://localhost:11434/api/delete \
-H "Content-Type: application/json" \
-d '{"name": "llama3.2"}'OpenAI-Compatible Endpoints
Drop-in replacement for any client that supports a custom base URL.
Python client
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="koda"
)
response = client.chat.completions.create(
model="llama3.2",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)List models
curl http://localhost:11434/v1/modelsChat completion
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama3.2", "messages": [{"role": "user", "content": "Hello"}]}'Streaming
Streaming works on both protocols — set "stream": true in the request body.