API Reference
LLMKit provides OpenAI-compatible API endpoints for inference, enabling you to leverage your managed prompts seamlessly with any OpenAI client library. This reference details the endpoints designed for executing prompts, excluding management-specific endpoints used by the UI.
Authentication
All inference requests require an API key for authentication. Include your API key in the Authorization
header as follows:
bash
Authorization: Bearer <YOUR_API_KEY>
To obtain an API key, use the LLMKit management interface (e.g., via the UI at http://localhost:3000/settings
after setup). The API key should be treated as sensitive, similar to a password.
Endpoints
Create Chat Completion
- URL:
/v1/chat/completions
- Method:
POST
- Description: Generates a non-streaming chat completion using the specified prompt key.
Request
- Headers:
Content-Type: application/json
Authorization: Bearer <YOUR_API_KEY>
- Body:json
{ "model": "your-prompt-key", "messages": [ { "role": "system", "content": "{\"variable1\": \"value1\", \"variable2\": \"value2\"}" }, { "role": "user", "content": "Your message here" } ], "temperature": 0.7, "max_tokens": 500, "response_format": { "type": "json_object" } }
- Parameters:
model
(string, required): The key of the prompt to use (e.g.,my-test-prompt
). In LLMKit, this corresponds to a prompt key rather than a model name.messages
(array, required): An array of message objects:role
(string): One ofsystem
,user
, orassistant
.content
(string): For dynamic prompts, the system message content must be a JSON string containing variables (e.g.,{"name": "John"}
).
temperature
(number, optional): Sampling temperature, defaults to 0.7.max_tokens
(integer, optional): Maximum number of tokens to generate, defaults to 256.response_format
(object, optional): Set to{"type": "json_object"}
to request JSON output.- Other OpenAI-compatible parameters (e.g.,
top_p
,stop
) may be supported; refer to OpenAI's documentation for additional options.
Response
- Success (200 OK):json
{ "id": "chatcmpl-123", "object": "chat.completion", "created": 1677652288, "model": "your-prompt-key", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Response from the model" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21 } }
- Errors:
401 Unauthorized
: Invalid or missing API key.json{ "error": "Invalid API key" }
404 Not Found
: Prompt key not found.json{ "error": "Prompt 'your-prompt-key' not found" }
500 Internal Server Error
: Server-side error.json{ "error": "Internal server error" }
Example
bash
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer llmkit_yourkey" \
-d '{
"model": "my-test-prompt",
"messages": [
{
"role": "system",
"content": "{\"name\": \"John\", \"city\": \"San Francisco\"}"
},
{
"role": "user",
"content": "Tell me a joke."
}
]
}'
Create Streaming Chat Completion
- URL:
/v1/chat/completions/stream
- Method:
POST
- Description: Generates a streaming chat completion, returning response chunks as they are produced.
Request
- Headers:
Content-Type: application/json
Authorization: Bearer <YOUR_API_KEY>
- Body:json
{ "model": "your-prompt-key", "messages": [ { "role": "system", "content": "{\"variable1\": \"value1\", \"variable2\": \"value2\"}" }, { "role": "user", "content": "Your message here" } ], "stream": true, "temperature": 0.7, "max_tokens": 500 }
- Parameters:
- Same as the non-streaming endpoint, with the addition of:
stream
(boolean, required): Must be set totrue
to enable streaming.
- Same as the non-streaming endpoint, with the addition of:
Response
- Success (200 OK):
- A series of JSON objects, each on its own line, representing response chunks.
- Example:
text
{"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"your-prompt-key","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]} {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"your-prompt-key","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]} {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"your-prompt-key","choices":[{"index":0,"delta":{"content":" there"},"finish_reason":null}]} {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1677652288,"model":"your-prompt-key","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
- Errors:
- Same as the non-streaming endpoint, returned as a single JSON object.
Example
bash
curl -X POST http://localhost:8000/v1/chat/completions/stream \
-H "Content-Type: application/json" \
-H "Authorization: Bearer llmkit_yourkey" \
-d '{
"model": "my-test-prompt",
"messages": [
{
"role": "system",
"content": "{\"name\": \"John\", \"city\": \"San Francisco\"}"
},
{
"role": "user",
"content": "Tell me a joke."
}
],
"stream": true
}'
Additional Notes
- Prompt Keys: The
model
parameter in requests refers to an LLMKit prompt key (e.g.,my-test-prompt
), not a specific model name. Prompts are configured in the management interface and linked to underlying models. - Dynamic Prompts: For prompts with dynamic variables, include a system message with a JSON string (e.g.,
{"name": "John"}
). These variables are substituted into the prompt template. - Tool Calling: If a prompt is configured with tools, the response
choices
may include atool_calls
array:json{ "index": 0, "message": { "role": "assistant", "content": null, "tool_calls": [ { "id": "call_abc123", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\": \"San Francisco\"}" } } ] }, "finish_reason": "tool_calls" }
Handle these as you would with OpenAI's tool calling feature. - JSON Mode: JSON Mode and JSON Schema are defined in the llmkit UI and are part of the prompt, not something that's passed in during the API call.
- Compatibility: These endpoints mirror OpenAI's chat completion API. For additional parameters or response details, consult OpenAI's API documentation, noting that
model
is a prompt key in LLMKit.
For practical usage examples in various programming languages, refer to the Code Examples section.