Docs

roadmap.mdx

Roadmap & Future Plans

This document outlines the strategic direction and planned improvements for the AI Inference Platform.

Short Term (Q1-Q2)

1. Classification Improvements

  • Advanced Heuristics: Incorporate domain-specific keywords and pattern matching for better intent detection.
  • Feedback Loop: Implement a mechanism to "correct" misclassifications based on user feedback or downstream metrics (e.g., if a simple model fails to answer).
  • Model-Based Classification: Experiment with using a tiny, ultra-fast SLM (Small Language Model) solely for the classification step, replacing static heuristics.

2. Low-Power Vector Database

  • Embedded Vector Search: Integrate a lightweight, process-embedded vector store (like sqlite-vss or a Rust native solution like LanceDB) to reduce the dependency on external Redis for vector operations.
  • Quantization: Implement vector quantization to reduce memory footprint and increase search speed on low-power hardware.

3. Expanded Model Support

  • Generic OpenAI-Compatible Client: reliable support for any provider adhering to the OpenAI API standard (Groq, Together.ai, vLLM).
  • Local Model Optimization: Better integration with local inference engines like llama.cpp directly via bindings, bypassing HTTP overhead.

Medium Term (Q3-Q4)

4. Adaptive Routing

  • Cost-Aware Routing: Dynamically route based on a real-time "budget" per user or tenant.
  • Latency-Aware Routing: Track model latency health and automatically route away from slow providers.

5. Enterprise Features

  • Multi-Tenancy: Strict data and configuration isolation for serving multiple distinct teams or customers.
  • Rate Limiting: Advanced, distributed rate limiting strategies (token bucket, leaky bucket) keyed by user tier.

Long Term

6. Edge Deployment

  • WebAssembly (Wasm) Support: Compile the core logic to Wasm to run on edge workers (Cloudflare Workers, Fastly) for zero-latency routing.
  • Mobile SDK: A stripped-down version of the router for direct embedding in mobile apps.

Note: This roadmap is subject to change based on community feedback and evolving requirements.