v0.1.0 — Rust · Axum · Redis

Route smarter.
Infer faster.

Intelligent query classification, multi-level semantic caching, and automatic model routing — all in a single Rust-powered inference layer.

0ms
Cache lookup (L1)
0+ RPS
Cached throughput
0%
Semantic match threshold
0+ RPS
Live inference
live routing log

// capabilities

routing
01

Intelligent Model Routing

Multi-factor heuristics + embedding analysis classify every query in under 50ms, then route to the right model automatically.

caching
02

Two-Level Semantic Cache

Local Moka LRU for microsecond hits. Redis for distributed, persistent storage with cosine-similarity matching across sessions.

observability
03

Prometheus + Grafana

Pre-wired dashboards, request histograms, cache hit ratios, and model health gauges out of the box.

// quick start

Docker-first,
zero config.

Spin up the full stack — platform, Redis, Prometheus, and Grafana — in a single command. Bring your own API key.

Full setup guide →
bash
# Run standalone
docker run \
  -p 8080:8080 \
  himanshu806/ai-inference-platform

# Full stack (Redis + monitoring)
docker-compose up -d

# Test it
curl -X POST http://localhost:8080/api/v1/infer \
  -H 'Content-Type: application/json' \
  -d '{"query":"Explain quantum computing"}'

Ship inference
infrastructure today.

Open source · MIT license · Built in Rust