Docs
introduction.mdx
Introduction
Welcome to the AI Inference Platform documentation.
What is it?
The AI Inference Platform is a high-performance, intelligent gateway that sits between your applications and AI models. It simplifies the complexity of managing AI inference at scale by providing:
- Intelligent Routing: Automatically routes queries to the most cost-effective and capable model based on complexity.
- Smart Caching: Reduces latency and costs by caching responses, using semantic matching to find similar queries.
- Unified API: A single, clean API for all your inference needs, regardless of the underlying model provider.
Why use it?
- Cost Savings: Route simple queries (e.g., "What is 2+2?") to smaller, cheaper models.
- Lower Latency: Serve cached responses instantly (milliseconds instead of seconds).
- Reliability: Fallback mechanisms ensuring high availability even if a provider goes down.
- Observability: Built-in metrics and logging to understand your AI traffic.
Getting Started
Check out the Usage Guide to learn how to set up and run the platform.
Architecture
To understand how it works under the hood, see the Architecture overview.