Docs
roadmap.mdx
Roadmap & Future Plans
This document outlines the strategic direction and planned improvements for the AI Inference Platform.
Short Term (Q1-Q2)
1. Classification Improvements
- Advanced Heuristics: Incorporate domain-specific keywords and pattern matching for better intent detection.
- Feedback Loop: Implement a mechanism to "correct" misclassifications based on user feedback or downstream metrics (e.g., if a simple model fails to answer).
- Model-Based Classification: Experiment with using a tiny, ultra-fast SLM (Small Language Model) solely for the classification step, replacing static heuristics.
2. Low-Power Vector Database
- Embedded Vector Search: Integrate a lightweight, process-embedded vector store (like
sqlite-vssor a Rust native solution likeLanceDB) to reduce the dependency on external Redis for vector operations. - Quantization: Implement vector quantization to reduce memory footprint and increase search speed on low-power hardware.
3. Expanded Model Support
- Generic OpenAI-Compatible Client: reliable support for any provider adhering to the OpenAI API standard (Groq, Together.ai, vLLM).
- Local Model Optimization: Better integration with local inference engines like
llama.cppdirectly via bindings, bypassing HTTP overhead.
Medium Term (Q3-Q4)
4. Adaptive Routing
- Cost-Aware Routing: Dynamically route based on a real-time "budget" per user or tenant.
- Latency-Aware Routing: Track model latency health and automatically route away from slow providers.
5. Enterprise Features
- Multi-Tenancy: Strict data and configuration isolation for serving multiple distinct teams or customers.
- Rate Limiting: Advanced, distributed rate limiting strategies (token bucket, leaky bucket) keyed by user tier.
Long Term
6. Edge Deployment
- WebAssembly (Wasm) Support: Compile the core logic to Wasm to run on edge workers (Cloudflare Workers, Fastly) for zero-latency routing.
- Mobile SDK: A stripped-down version of the router for direct embedding in mobile apps.
Note: This roadmap is subject to change based on community feedback and evolving requirements.