Docs

api.mdx

AI Inference Platform API Documentation

Overview

The AI Inference Platform provides a high-performance HTTP API for querying AI models with intelligent routing, caching, and classification.

Base URL: http://localhost:8080 (default) Version: v1

Endpoints

1. Inference

Submit a query to the AI platform. The system will classify the query complexity and route it to the appropriate model (e.g., a smaller, faster model for simple queries, or a larger, more capable model for complex reasoning).

URL: /api/v1/infer
Method: POST
Content-Type: application/json

Request Body

Field	Type	Required	Description
`query`	string	Yes	The text query to process.
`context`	string	No	Additional context for the query.
`max_tokens`	integer	No	Maximum number of tokens to generate.
`temperature`	float	No	Sampling temperature (0.0 to 1.0).
`user_id`	string	No	Unique identifier for the user.
`session_id`	string	No	Unique identifier for the session.
`skip_cache`	boolean	No	If `true`, bypasses the cache and forces a fresh inference.

Example Request

json

{
  "query": "Explain quantum entanglement in simple terms",
  "temperature": 0.7,
  "max_tokens": 150
}

Success Response (200 OK)

json

{
  "success": true,
  "data": {
    "response": "Quantum entanglement is a phenomenon where...",
    "model_used": "complex-model-v4",
    "complexity": "complex",
    "confidence": 0.98,
    "latency_ms": 1250,
    "cached": false,
    "tokens_used": {
      "prompt_tokens": 15,
      "completion_tokens": 85,
      "total_tokens": 100,
      "estimated_cost": 0.002
    }
  },
  "error": null,
  "request_id": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2023-10-27T10:00:00Z"
}

Error Response (400 Bad Request)

json

{
  "success": false,
  "data": null,
  "error": "Query cannot be empty",
  "request_id": "...",
  "timestamp": "..."
}

2. Classification

Classify a query's complexity without performing the actual inference. Useful for pre-flight checks or cost estimation.

URL: /api/v1/classify
Method: POST
Content-Type: application/json

Request Body

Field	Type	Required	Description
`query`	string	Yes	The query to classify.

Example Request

json

{
  "query": "What is 2+2?"
}

Success Response (200 OK)

json

{
  "success": true,
  "data": {
    "complexity": "simple",
    "confidence": 0.99,
    "scores": {
      "simple": 0.99,
      "complex": 0.01
    }
  },
  "error": null,
  "request_id": "...",
  "timestamp": "..."
}

3. Health Check

Check the health status of the service and its dependencies (Redis, Models).

URL: /health
Method: GET

Success Response (200 OK)

json

{
  "success": true,
  "data": {
    "status": "healthy",
    "version": "0.1.0",
    "uptime_seconds": 3600,
    "services": {
      "redis": true,
      "simple_model": true,
      "complex_model": true
    }
  },
  "error": null,
  "request_id": "...",
  "timestamp": "..."
}

4. Metrics

Expose Prometheus metrics for monitoring.

URL: /metrics
Method: GET
Format: Prometheus text format

5. Statistics

Get runtime statistics about cache usage and routing.

URL: /api/stats
Method: GET

Success Response (200 OK)

json

{
  "success": true,
  "data": {
    "cache_stats": {
      "local_cache_size": 150,
      "redis_connected": true,
      "redis_key_count": 5000
    },
    "router_stats": {
      "total_requests": 1000,
      "simple_routed": 800,
      "complex_routed": 200
    },
    "active_requests": 5
  },
  "error": null,
  "request_id": "...",
  "timestamp": "..."
}

6. Clear Cache

Clear both local and Redis caches. Only available to admins (implementation currently open).

URL: /api/cache/clear
Method: POST

Success Response (200 OK)

json

{
  "success": true,
  "data": "Cache cleared successfully",
  "error": null,
  "request_id": "...",
  "timestamp": "..."
}

PreviousLoad Benchmark Nextarchitecture.mdx