API Reference

LLMKit provides OpenAI-compatible API endpoints for inference, enabling you to leverage your managed prompts seamlessly with any OpenAI client library. This reference details the endpoints designed for executing prompts, excluding management-specific endpoints used by the UI.

Authentication

All inference requests require an API key for authentication. Include your API key in the Authorization header as follows:

bash

Authorization: Bearer <YOUR_API_KEY>

To obtain an API key, use the LLMKit management interface (e.g., via the UI at http://localhost:3000/settings after setup). The API key should be treated as sensitive, similar to a password.

Endpoints

Create Chat Completion

URL: /v1/chat/completions
Method: POST
Description: Generates a non-streaming chat completion using the specified prompt key.

Request

Headers:
- Content-Type: application/json
- Authorization: Bearer <YOUR_API_KEY>

Body:

json

{
  "model": "your-prompt-key",
  "messages": [
    {
      "role": "system",
      "content": "{\"variable1\": \"value1\", \"variable2\": \"value2\"}"
    },
    {
      "role": "user",
      "content": "Your message here"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 500,
  "response_format": {
    "type": "json_object"
  }
}

Parameters:
- model (string, required): The key of the prompt to use (e.g., my-test-prompt). In LLMKit, this corresponds to a prompt key rather than a model name.
- messages (array, required): An array of message objects:
  - role (string): One of system, user, or assistant.
  - content (string): For dynamic prompts, the system message content must be a JSON string containing variables (e.g., {"name": "John"}).
- temperature (number, optional): Sampling temperature, defaults to 0.7.
- max_tokens (integer, optional): Maximum number of tokens to generate, defaults to 256.
- response_format (object, optional): Set to {"type": "json_object"} to request JSON output.
- Other OpenAI-compatible parameters (e.g., top_p, stop) may be supported; refer to OpenAI's documentation for additional options.

Response

Success (200 OK):

json

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "your-prompt-key",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Response from the model"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}

Errors:
- 401 Unauthorized: Invalid or missing API key.
  json
```
{
  "error": "Invalid API key"
}
```
- 404 Not Found: Prompt key not found.
  json
```
{
  "error": "Prompt 'your-prompt-key' not found"
}
```
- 500 Internal Server Error: Server-side error.
  json
```
{
  "error": "Internal server error"
}
```

Example

bash

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer llmkit_yourkey" \
  -d '{
    "model": "my-test-prompt",
    "messages": [
      {
        "role": "system",
        "content": "{\"name\": \"John\", \"city\": \"San Francisco\"}"
      },
      {
        "role": "user",
        "content": "Tell me a joke."
      }
    ]
  }'

Create Streaming Chat Completion

URL: /v1/chat/completions/stream
Method: POST
Description: Generates a streaming chat completion, returning response chunks as they are produced.

Request

Headers:
- Content-Type: application/json
- Authorization: Bearer <YOUR_API_KEY>

Body:

json

{
  "model": "your-prompt-key",
  "messages": [
    {
      "role": "system",
      "content": "{\"variable1\": \"value1\", \"variable2\": \"value2\"}"
    },
    {
      "role": "user",
      "content": "Your message here"
    }
  ],
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 500
}

Parameters:
- Same as the non-streaming endpoint, with the addition of:
  - stream (boolean, required): Must be set to true to enable streaming.

Response

Success (200 OK):

A series of JSON objects, each on its own line, representing response chunks.

Example:

text

{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"your-prompt-key","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"your-prompt-key","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"your-prompt-key","choices":[{"index":0,"delta":{"content":" there"},"finish_reason":null}]}
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"your-prompt-key","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

Errors:
- Same as the non-streaming endpoint, returned as a single JSON object.

Example

bash

curl -X POST http://localhost:8000/v1/chat/completions/stream \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer llmkit_yourkey" \
  -d '{
    "model": "my-test-prompt",
    "messages": [
      {
        "role": "system",
        "content": "{\"name\": \"John\", \"city\": \"San Francisco\"}"
      },
      {
        "role": "user",
        "content": "Tell me a joke."
      }
    ],
    "stream": true
  }'

Additional Notes

Prompt Keys: The model parameter in requests refers to an LLMKit prompt key (e.g., my-test-prompt), not a specific model name. Prompts are configured in the management interface and linked to underlying models.
Dynamic Prompts: For prompts with dynamic variables, include a system message with a JSON string (e.g., {"name": "John"}). These variables are substituted into the prompt template.

Tool Calling: If a prompt is configured with tools, the response choices may include a tool_calls array:

json

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": null,
    "tool_calls": [
      {
        "id": "call_abc123",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\": \"San Francisco\"}"
        }
      }
    ]
  },
  "finish_reason": "tool_calls"
}

Handle these as you would with OpenAI's tool calling feature.

JSON Mode: JSON Mode and JSON Schema are defined in the llmkit UI and are part of the prompt, not something that's passed in during the API call.
Compatibility: These endpoints mirror OpenAI's chat completion API. For additional parameters or response details, consult OpenAI's API documentation, noting that model is a prompt key in LLMKit.

For practical usage examples in various programming languages, refer to the Code Examples section.