Fallbakit Docs

Customer documentation for connecting your applications to local-first Ollama, oMLX, or vLLM routing with secure cloud fallback.

Fallbakit gives your application one OpenAI-compatible API endpoint. Requests go to your customer-side Ollama, oMLX, or vLLM runtime first, then use your configured cloud provider only when local inference cannot serve.

Use these docs when you are connecting an application, installing the tunnel agent, managing provider credentials, or reviewing usage and billing.

Start here

Create an application for each product, environment, or customer workspace you want to isolate.
Add allowed domains or IP ranges before sending chat requests.
Create an application API key and store it in your own secret manager or environment.
Install the tunnel agent near the customer's local runtime host.
Send requests with the OpenAI SDK, Python SDK, Node.js SDK, or any OpenAI-compatible HTTP client pointed at https://api.fallbakit.com.

Customer tasks

Platform walkthrough: understand the dashboard, routing lifecycle, and integration flow.
Applications: manage app boundaries, keys, fallback credentials, and allowlists.
Tunnel: connect customer-side Ollama, oMLX, or vLLM to Fallbakit securely.
Providers: learn how application-scoped fallback credentials work inside the Applications workspace.
Usage stats: inspect traffic, local/cloud routing, and savings.
Billing: review plan state, metering, and account limits.

Integration shape

Your app calls client.chat.completions.create(...). With the official OpenAI SDKs, set the SDK base URL to https://api.fallbakit.com/v1; with Fallbakit's SDKs, use https://api.fallbakit.com. Fallbakit authenticates the application key, checks the allowlist, routes to the customer tunnel first, and falls back to the configured cloud provider only when local inference cannot serve the request.

Start here

Customer tasks

Integration shape

On this page