v0.1.0 — Rust · Axum · Redis

Route smarter.
Infer faster.

Intelligent query classification, multi-level semantic caching, and automatic model routing — all in a single Rust-powered inference layer.

Read the Docs →View on Docker Hub

0ms

Cache lookup (L1)

0+ RPS

Cached throughput

Semantic match threshold

0+ RPS

Live inference

live routing log

// capabilities

routing

Intelligent Model Routing

Multi-factor heuristics + embedding analysis classify every query in under 50ms, then route to the right model automatically.

caching

Two-Level Semantic Cache

Local Moka LRU for microsecond hits. Redis for distributed, persistent storage with cosine-similarity matching across sessions.

observability

Prometheus + Grafana

Pre-wired dashboards, request histograms, cache hit ratios, and model health gauges out of the box.

// quick start

Docker-first,
zero config.

Spin up the full stack — platform, Redis, Prometheus, and Grafana — in a single command. Bring your own API key.

Full setup guide →

bash

# Run standalone
docker run \
  -p 8080:8080 \
  himanshu806/ai-inference-platform

# Full stack (Redis + monitoring)
docker-compose up -d

# Test it
curl -X POST http://localhost:8080/api/v1/infer \
  -H 'Content-Type: application/json' \
  -d '{"query":"Explain quantum computing"}'

Ship inference
infrastructure today.

Open source · MIT license · Built in Rust

Read the Docs →view on Docker Hub

Route smarter.Infer faster.