API Reference

LLMKit provides OpenAI-compatible API endpoints for inference, enabling you to leverage your managed prompts seamlessly with any OpenAI client library. This reference details the endpoints designed for executing prompts, excluding management-specific endpoints used by the UI.

Authentication

All inference requests require an API key for authentication. Include your API key in the Authorization header as follows:

bash
Authorization: Bearer <YOUR_API_KEY>

To obtain an API key, use the LLMKit management interface (e.g., via the UI at http://localhost:3000/settings after setup). The API key should be treated as sensitive, similar to a password.

Endpoints

Create Chat Completion

  • URL: /v1/chat/completions
  • Method: POST
  • Description: Generates a non-streaming chat completion using the specified prompt key.

Request

  • Headers:
    • Content-Type: application/json
    • Authorization: Bearer <YOUR_API_KEY>
  • Body:
    json
    {
      "model": "your-prompt-key",
      "messages": [
        {
          "role": "system",
          "content": "{\"variable1\": \"value1\", \"variable2\": \"value2\"}"
        },
        {
          "role": "user",
          "content": "Your message here"
        }
      ],
      "temperature": 0.7,
      "max_tokens": 500,
      "response_format": {
        "type": "json_object"
      }
    }
    
  • Parameters:
    • model (string, required): The key of the prompt to use (e.g., my-test-prompt). In LLMKit, this corresponds to a prompt key rather than a model name.
    • messages (array, required): An array of message objects:
      • role (string): One of system, user, or assistant.
      • content (string): For dynamic prompts, the system message content must be a JSON string containing variables (e.g., {"name": "John"}).
    • temperature (number, optional): Sampling temperature, defaults to 0.7.
    • max_tokens (integer, optional): Maximum number of tokens to generate, defaults to 256.
    • response_format (object, optional): Set to {"type": "json_object"} to request JSON output.
    • Other OpenAI-compatible parameters (e.g., top_p, stop) may be supported; refer to OpenAI's documentation for additional options.

Response

  • Success (200 OK):
    json
    {
      "id": "chatcmpl-123",
      "object": "chat.completion",
      "created": 1677652288,
      "model": "your-prompt-key",
      "choices": [
        {
          "index": 0,
          "message": {
            "role": "assistant",
            "content": "Response from the model"
          },
          "finish_reason": "stop"
        }
      ],
      "usage": {
        "prompt_tokens": 9,
        "completion_tokens": 12,
        "total_tokens": 21
      }
    }
    
  • Errors:
    • 401 Unauthorized: Invalid or missing API key.
      json
      {
        "error": "Invalid API key"
      }
      
    • 404 Not Found: Prompt key not found.
      json
      {
        "error": "Prompt 'your-prompt-key' not found"
      }
      
    • 500 Internal Server Error: Server-side error.
      json
      {
        "error": "Internal server error"
      }
      

Example

bash
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer llmkit_yourkey" \
  -d '{
    "model": "my-test-prompt",
    "messages": [
      {
        "role": "system",
        "content": "{\"name\": \"John\", \"city\": \"San Francisco\"}"
      },
      {
        "role": "user",
        "content": "Tell me a joke."
      }
    ]
  }'

Create Streaming Chat Completion

  • URL: /v1/chat/completions/stream
  • Method: POST
  • Description: Generates a streaming chat completion, returning response chunks as they are produced.

Request

  • Headers:
    • Content-Type: application/json
    • Authorization: Bearer <YOUR_API_KEY>
  • Body:
    json
    {
      "model": "your-prompt-key",
      "messages": [
        {
          "role": "system",
          "content": "{\"variable1\": \"value1\", \"variable2\": \"value2\"}"
        },
        {
          "role": "user",
          "content": "Your message here"
        }
      ],
      "stream": true,
      "temperature": 0.7,
      "max_tokens": 500
    }
    
  • Parameters:
    • Same as the non-streaming endpoint, with the addition of:
      • stream (boolean, required): Must be set to true to enable streaming.

Response

  • Success (200 OK):
    • A series of JSON objects, each on its own line, representing response chunks.
    • Example:
      text
      {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"your-prompt-key","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
      {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"your-prompt-key","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
      {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"your-prompt-key","choices":[{"index":0,"delta":{"content":" there"},"finish_reason":null}]}
      {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"your-prompt-key","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
      
  • Errors:
    • Same as the non-streaming endpoint, returned as a single JSON object.

Example

bash
curl -X POST http://localhost:8000/v1/chat/completions/stream \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer llmkit_yourkey" \
  -d '{
    "model": "my-test-prompt",
    "messages": [
      {
        "role": "system",
        "content": "{\"name\": \"John\", \"city\": \"San Francisco\"}"
      },
      {
        "role": "user",
        "content": "Tell me a joke."
      }
    ],
    "stream": true
  }'

Additional Notes

  • Prompt Keys: The model parameter in requests refers to an LLMKit prompt key (e.g., my-test-prompt), not a specific model name. Prompts are configured in the management interface and linked to underlying models.
  • Dynamic Prompts: For prompts with dynamic variables, include a system message with a JSON string (e.g., {"name": "John"}). These variables are substituted into the prompt template.
  • Tool Calling: If a prompt is configured with tools, the response choices may include a tool_calls array:
    json
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"location\": \"San Francisco\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
    

    Handle these as you would with OpenAI's tool calling feature.
  • JSON Mode: JSON Mode and JSON Schema are defined in the llmkit UI and are part of the prompt, not something that's passed in during the API call.
  • Compatibility: These endpoints mirror OpenAI's chat completion API. For additional parameters or response details, consult OpenAI's API documentation, noting that model is a prompt key in LLMKit.

For practical usage examples in various programming languages, refer to the Code Examples section.