Quick Start

Lyumen is a free OpenAI-compatible LLM inference API. No account. No API key. Just send requests.

curl
Python
JavaScript
curl https://lyumen-api.okotto.workers.dev/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3-flash",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'
import requests

response = requests.post(
    "https://lyumen-api.okotto.workers.dev/v1/chat/completions",
    json={
        "model": "gemini-3-flash",
        "messages": [{"role": "user", "content": "Hello!"}]
    }
)
print(response.json())
fetch("https://lyumen-api.okotto.workers.dev/v1/chat/completions", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "gemini-3-flash",
    messages: [{ role: "user", content: "Hello!" }]
  })
}).then(res => res.json()).then(console.log);

How It Works

Requests are proxied to inference backends. Everything is logged to a database. Logged data may be sold as AI training data. Transparent, no surprises.

POST /v1/chat/completions

Create a completion for the chat message.

Parameter Type Required Description
model string yes Model ID from /v1/models
messages array yes OpenAI-format messages array
stream boolean no Enable SSE streaming (default false)
max_tokens integer no Max tokens to generate

Example Request

JSON
{
  "model": "gemini-3-flash",
  "messages": [
    {"role": "user", "content": "Hi"}
  ]
}

Example Response

JSON
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gemini-3-flash",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help?"
    },
    "finish_reason": "stop",
    "index": 0
  }]
}

GET /v1/models

Retrieve a list of available models.

Response
{
  "data": [
    { "id": "gemini-3-flash", "object": "model" },
    { "id": "minimax-m2.7", "object": "model" }
  ]
}

GET /v1/logs/count

Get the total number of requests processed by Lyumen.

Response
{ "count": 42 }

curl

Standard curl examples for Lyumen.

Non-streaming

bash
curl https://lyumen-api.okotto.workers.dev/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3-flash",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Streaming

bash
curl https://lyumen-api.okotto.workers.dev/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-3-flash",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Python

You can use Lyumen with the standard requests library or the official OpenAI Python SDK.

Using Requests

python
import requests

url = "https://lyumen-api.okotto.workers.dev/v1/chat/completions"
data = {
    "model": "gemini-3-flash",
    "messages": [{"role": "user", "content": "How are you?"}]
}
response = requests.post(url, json=data)
print(response.json())

Using OpenAI SDK

python
from openai import OpenAI

client = OpenAI(
    base_url="https://lyumen-api.okotto.workers.dev/v1",
    api_key="lyumen" # Any string works
)

completion = client.chat.completions.create(
    model="gemini-3-flash",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(completion.choices[0].message.content)

RooCode

Step by step configuration for RooCode:

  • Open RooCode settings
  • Set API Provider to OpenAI Compatible
  • Base URL: https://lyumen-api.okotto.workers.dev
  • API Key: any string e.g. lyumen
  • Model: gemini-3-flash

VS Code / Continue

Add the following to your config.json for the Continue extension:

config.json
{
  "models": [
    {
      "title": "Lyumen Gemini",
      "provider": "openai",
      "model": "gemini-3-flash",
      "apiKey": "lyumen",
      "apiBase": "https://lyumen-api.okotto.workers.dev/v1"
    }
  ]
}

OpenWebUI

To use Lyumen with OpenWebUI:

  • Go to Admin PanelSettings
  • Select ConnectionsOpenAI
  • Set the API Base URL to https://lyumen-api.okotto.workers.dev/v1
  • Set a dummy API Key (e.g., lyumen)