Docs

api.mdx

AI Inference Platform API Documentation

Overview

The AI Inference Platform provides a high-performance HTTP API for querying AI models with intelligent routing, caching, and classification.

Base URL: http://localhost:8080 (default) Version: v1

Endpoints

1. Inference

Submit a query to the AI platform. The system will classify the query complexity and route it to the appropriate model (e.g., a smaller, faster model for simple queries, or a larger, more capable model for complex reasoning).

  • URL: /api/v1/infer
  • Method: POST
  • Content-Type: application/json

Request Body

FieldTypeRequiredDescription
querystringYesThe text query to process.
contextstringNoAdditional context for the query.
max_tokensintegerNoMaximum number of tokens to generate.
temperaturefloatNoSampling temperature (0.0 to 1.0).
user_idstringNoUnique identifier for the user.
session_idstringNoUnique identifier for the session.
skip_cachebooleanNoIf true, bypasses the cache and forces a fresh inference.

Example Request

json
{
  "query": "Explain quantum entanglement in simple terms",
  "temperature": 0.7,
  "max_tokens": 150
}

Success Response (200 OK)

json
{
  "success": true,
  "data": {
    "response": "Quantum entanglement is a phenomenon where...",
    "model_used": "complex-model-v4",
    "complexity": "complex",
    "confidence": 0.98,
    "latency_ms": 1250,
    "cached": false,
    "tokens_used": {
      "prompt_tokens": 15,
      "completion_tokens": 85,
      "total_tokens": 100,
      "estimated_cost": 0.002
    }
  },
  "error": null,
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2023-10-27T10:00:00Z"
}

Error Response (400 Bad Request)

json
{
  "success": false,
  "data": null,
  "error": "Query cannot be empty",
  "request_id": "...",
  "timestamp": "..."
}

2. Classification

Classify a query's complexity without performing the actual inference. Useful for pre-flight checks or cost estimation.

  • URL: /api/v1/classify
  • Method: POST
  • Content-Type: application/json

Request Body

FieldTypeRequiredDescription
querystringYesThe query to classify.

Example Request

json
{
  "query": "What is 2+2?"
}

Success Response (200 OK)

json
{
  "success": true,
  "data": {
    "complexity": "simple",
    "confidence": 0.99,
    "scores": {
      "simple": 0.99,
      "complex": 0.01
    }
  },
  "error": null,
  "request_id": "...",
  "timestamp": "..."
}

3. Health Check

Check the health status of the service and its dependencies (Redis, Models).

  • URL: /health
  • Method: GET

Success Response (200 OK)

json
{
  "success": true,
  "data": {
    "status": "healthy",
    "version": "0.1.0",
    "uptime_seconds": 3600,
    "services": {
      "redis": true,
      "simple_model": true,
      "complex_model": true
    }
  },
  "error": null,
  "request_id": "...",
  "timestamp": "..."
}

4. Metrics

Expose Prometheus metrics for monitoring.

  • URL: /metrics
  • Method: GET
  • Format: Prometheus text format

5. Statistics

Get runtime statistics about cache usage and routing.

  • URL: /api/stats
  • Method: GET

Success Response (200 OK)

json
{
  "success": true,
  "data": {
    "cache_stats": {
      "local_cache_size": 150,
      "redis_connected": true,
      "redis_key_count": 5000
    },
    "router_stats": {
      "total_requests": 1000,
      "simple_routed": 800,
      "complex_routed": 200
    },
    "active_requests": 5
  },
  "error": null,
  "request_id": "...",
  "timestamp": "..."
}

6. Clear Cache

Clear both local and Redis caches. Only available to admins (implementation currently open).

  • URL: /api/cache/clear
  • Method: POST

Success Response (200 OK)

json
{
  "success": true,
  "data": "Cache cleared successfully",
  "error": null,
  "request_id": "...",
  "timestamp": "..."
}