Intelligent query classification, multi-level semantic caching, and automatic model routing — all in a single Rust-powered inference layer.
// capabilities
Multi-factor heuristics + embedding analysis classify every query in under 50ms, then route to the right model automatically.
Local Moka LRU for microsecond hits. Redis for distributed, persistent storage with cosine-similarity matching across sessions.
Pre-wired dashboards, request histograms, cache hit ratios, and model health gauges out of the box.
// quick start
Spin up the full stack — platform, Redis, Prometheus, and Grafana — in a single command. Bring your own API key.
Full setup guide →# Run standalone
docker run \
-p 8080:8080 \
himanshu806/ai-inference-platform
# Full stack (Redis + monitoring)
docker-compose up -d
# Test it
curl -X POST http://localhost:8080/api/v1/infer \
-H 'Content-Type: application/json' \
-d '{"query":"Explain quantum computing"}'Open source · MIT license · Built in Rust