api.mdx
AI Inference Platform API Documentation
Overview
The AI Inference Platform provides a high-performance HTTP API for querying AI models with intelligent routing, caching, and classification.
Base URL: http://localhost:8080 (default)
Version: v1
Endpoints
1. Inference
Submit a query to the AI platform. The system will classify the query complexity and route it to the appropriate model (e.g., a smaller, faster model for simple queries, or a larger, more capable model for complex reasoning).
- URL:
/api/v1/infer - Method:
POST - Content-Type:
application/json
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
query | string | Yes | The text query to process. |
context | string | No | Additional context for the query. |
max_tokens | integer | No | Maximum number of tokens to generate. |
temperature | float | No | Sampling temperature (0.0 to 1.0). |
user_id | string | No | Unique identifier for the user. |
session_id | string | No | Unique identifier for the session. |
skip_cache | boolean | No | If true, bypasses the cache and forces a fresh inference. |
Example Request
{
"query": "Explain quantum entanglement in simple terms",
"temperature": 0.7,
"max_tokens": 150
}Success Response (200 OK)
{
"success": true,
"data": {
"response": "Quantum entanglement is a phenomenon where...",
"model_used": "complex-model-v4",
"complexity": "complex",
"confidence": 0.98,
"latency_ms": 1250,
"cached": false,
"tokens_used": {
"prompt_tokens": 15,
"completion_tokens": 85,
"total_tokens": 100,
"estimated_cost": 0.002
}
},
"error": null,
"request_id": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2023-10-27T10:00:00Z"
}Error Response (400 Bad Request)
{
"success": false,
"data": null,
"error": "Query cannot be empty",
"request_id": "...",
"timestamp": "..."
}2. Classification
Classify a query's complexity without performing the actual inference. Useful for pre-flight checks or cost estimation.
- URL:
/api/v1/classify - Method:
POST - Content-Type:
application/json
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
query | string | Yes | The query to classify. |
Example Request
{
"query": "What is 2+2?"
}Success Response (200 OK)
{
"success": true,
"data": {
"complexity": "simple",
"confidence": 0.99,
"scores": {
"simple": 0.99,
"complex": 0.01
}
},
"error": null,
"request_id": "...",
"timestamp": "..."
}3. Health Check
Check the health status of the service and its dependencies (Redis, Models).
- URL:
/health - Method:
GET
Success Response (200 OK)
{
"success": true,
"data": {
"status": "healthy",
"version": "0.1.0",
"uptime_seconds": 3600,
"services": {
"redis": true,
"simple_model": true,
"complex_model": true
}
},
"error": null,
"request_id": "...",
"timestamp": "..."
}4. Metrics
Expose Prometheus metrics for monitoring.
- URL:
/metrics - Method:
GET - Format: Prometheus text format
5. Statistics
Get runtime statistics about cache usage and routing.
- URL:
/api/stats - Method:
GET
Success Response (200 OK)
{
"success": true,
"data": {
"cache_stats": {
"local_cache_size": 150,
"redis_connected": true,
"redis_key_count": 5000
},
"router_stats": {
"total_requests": 1000,
"simple_routed": 800,
"complex_routed": 200
},
"active_requests": 5
},
"error": null,
"request_id": "...",
"timestamp": "..."
}6. Clear Cache
Clear both local and Redis caches. Only available to admins (implementation currently open).
- URL:
/api/cache/clear - Method:
POST
Success Response (200 OK)
{
"success": true,
"data": "Cache cleared successfully",
"error": null,
"request_id": "...",
"timestamp": "..."
}