Docs

roadmap.mdx

Roadmap & Future Plans

This document outlines the strategic direction and planned improvements for the AI Inference Platform.

Short Term (Q1-Q2)

1. Classification Improvements

Advanced Heuristics: Incorporate domain-specific keywords and pattern matching for better intent detection.
Feedback Loop: Implement a mechanism to "correct" misclassifications based on user feedback or downstream metrics (e.g., if a simple model fails to answer).
Model-Based Classification: Experiment with using a tiny, ultra-fast SLM (Small Language Model) solely for the classification step, replacing static heuristics.

2. Low-Power Vector Database

Embedded Vector Search: Integrate a lightweight, process-embedded vector store (like sqlite-vss or a Rust native solution like LanceDB) to reduce the dependency on external Redis for vector operations.
Quantization: Implement vector quantization to reduce memory footprint and increase search speed on low-power hardware.

3. Expanded Model Support

Generic OpenAI-Compatible Client: reliable support for any provider adhering to the OpenAI API standard (Groq, Together.ai, vLLM).
Local Model Optimization: Better integration with local inference engines like llama.cpp directly via bindings, bypassing HTTP overhead.

Medium Term (Q3-Q4)

4. Adaptive Routing

Cost-Aware Routing: Dynamically route based on a real-time "budget" per user or tenant.
Latency-Aware Routing: Track model latency health and automatically route away from slow providers.

5. Enterprise Features

Multi-Tenancy: Strict data and configuration isolation for serving multiple distinct teams or customers.
Rate Limiting: Advanced, distributed rate limiting strategies (token bucket, leaky bucket) keyed by user tier.

Long Term

6. Edge Deployment

WebAssembly (Wasm) Support: Compile the core logic to Wasm to run on edge workers (Cloudflare Workers, Fastly) for zero-latency routing.
Mobile SDK: A stripped-down version of the router for direct embedding in mobile apps.

Note: This roadmap is subject to change based on community feedback and evolving requirements.

Previousintroduction.mdx Nextusage.mdx