Skip to content

Universal Endpoint (Deprecated)

The Universal Endpoint allows you to contact every provider through a single endpoint.

https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}

The payload expects an array of messages. Each message is an object with the following parameters:

  • provider: the name of the provider you would like to direct this message to. Can be OpenAI, workers-ai, or any of our supported providers.
  • endpoint: the pathname of the provider API you are trying to reach. For example, on OpenAI it can be chat/completions, and for Workers AI this might be @cf/meta/llama-3.1-8b-instruct. Refer to the sections that are specific to each provider.
  • authorization: the content of the Authorization HTTP Header that should be used when contacting this provider. This usually starts with Token or Bearer.
  • query: the payload as the provider expects it in their official API.

cURL example

Request
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id} \
--header 'Content-Type: application/json' \
--data '[
{
"provider": "workers-ai",
"endpoint": "@cf/meta/llama-3.1-8b-instruct",
"headers": {
"Authorization": "Bearer {cloudflare_token}",
"Content-Type": "application/json"
},
"query": {
"messages": [
{
"role": "system",
"content": "You are a friendly assistant"
},
{
"role": "user",
"content": "What is Cloudflare?"
}
]
}
},
{
"provider": "openai",
"endpoint": "chat/completions",
"headers": {
"Authorization": "Bearer {open_ai_token}",
"Content-Type": "application/json"
},
"query": {
"model": "gpt-4o-mini",
"stream": true,
"messages": [
{
"role": "user",
"content": "What is Cloudflare?"
}
]
}
}
]'

The above will send a request to Workers AI Inference API. If it fails, it will proceed to OpenAI. You can add as many fallbacks as you need by adding another object in the array.

Fallbacks

You can specify model or provider fallbacks to handle request failures and ensure reliability. The payload array defines the fallback sequence — if the first provider fails, the request falls to the next entry in the array. For more details, refer to Fallbacks.

By default, Cloudflare triggers your fallback if a model request returns an error. You can also configure request timeouts to trigger fallbacks when a provider takes too long to respond.

Response header (cf-aig-step)

When using fallbacks, the response header cf-aig-step indicates which model successfully processed the request by returning the step number:

  • cf-aig-step:0 — The first (primary) model was used successfully.
  • cf-aig-step:1 — The request fell back to the second model.
  • cf-aig-step:2 — The request fell back to the third model.
  • Subsequent steps — Each fallback increments the step number by 1.

Request timeouts

A request timeout triggers a fallback if a provider takes too long to respond.

Configure the timeout by setting a requestTimeout property (in milliseconds) within the provider-specific config object. Each provider can have a different requestTimeout value.

The timeout is based on when the first part of the response comes back. As long as the first part of the response returns within the specified timeframe — such as when streaming a response — your gateway will wait for the response.

Request timeout example
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \
--header 'Content-Type: application/json' \
--data '[
{
"provider": "workers-ai",
"endpoint": "@cf/meta/llama-3.1-8b-instruct",
"headers": {
"Authorization": "Bearer {cloudflare_token}",
"Content-Type": "application/json"
},
"config": {
"requestTimeout": 1000
},
"query": {
34 collapsed lines
"messages": [
{
"role": "system",
"content": "You are a friendly assistant"
},
{
"role": "user",
"content": "What is Cloudflare?"
}
]
}
},
{
"provider": "workers-ai",
"endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",
"headers": {
"Authorization": "Bearer {cloudflare_token}",
"Content-Type": "application/json"
},
"query": {
"messages": [
{
"role": "system",
"content": "You are a friendly assistant"
},
{
"role": "user",
"content": "What is Cloudflare?"
}
]
},
"config": {
"requestTimeout": 3000
},
}
]'

Request retries

The Universal Endpoint supports automatic retries for failed requests, with a maximum of five retry attempts. Retries are attempted before triggering any configured fallbacks.

Configure the retry settings with the following properties in the provider-specific config:

TypeScript
config:{
maxAttempts?: number;
retryDelay?: number;
backoff?: "constant" | "linear" | "exponential";
}
  • maxAttempts: Maximum number of retry attempts (up to 5).
  • retryDelay: Delay before retrying, in milliseconds (maximum of 5 seconds).
  • backoff: Backoff method — constant, linear, or exponential.

On the final retry attempt, your gateway will wait until the request completes, regardless of how long it takes. Each provider can have different retry settings.

Request retry example
curl 'https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}' \
--header 'Content-Type: application/json' \
--data '[
{
"provider": "workers-ai",
"endpoint": "@cf/meta/llama-3.1-8b-instruct",
"headers": {
"Authorization": "Bearer {cloudflare_token}",
"Content-Type": "application/json"
},
"config": {
"maxAttempts": 2,
"retryDelay": 1000,
"backoff": "constant"
},
39 collapsed lines
"query": {
"messages": [
{
"role": "system",
"content": "You are a friendly assistant"
},
{
"role": "user",
"content": "What is Cloudflare?"
}
]
}
},
{
"provider": "workers-ai",
"endpoint": "@cf/meta/llama-3.1-8b-instruct-fast",
"headers": {
"Authorization": "Bearer {cloudflare_token}",
"Content-Type": "application/json"
},
"query": {
"messages": [
{
"role": "system",
"content": "You are a friendly assistant"
},
{
"role": "user",
"content": "What is Cloudflare?"
}
]
},
"config": {
"maxAttempts": 4,
"retryDelay": 1000,
"backoff": "exponential"
},
}
]'

WebSockets API beta

The Universal Endpoint can also be accessed via a WebSockets API which provides a single persistent connection, enabling continuous communication. This API supports all AI providers connected to AI Gateway, including those that do not natively support WebSockets.

WebSockets example

JavaScript
import WebSocket from "ws";
const ws = new WebSocket(
"wss://gateway.ai.cloudflare.com/v1/my-account-id/my-gateway/",
{
headers: {
"cf-aig-authorization": "Bearer AI_GATEWAY_TOKEN",
},
},
);
ws.send(
JSON.stringify({
type: "universal.create",
request: {
eventId: "my-request",
provider: "workers-ai",
endpoint: "@cf/meta/llama-3.1-8b-instruct",
headers: {
Authorization: "Bearer WORKERS_AI_TOKEN",
"Content-Type": "application/json",
},
query: {
prompt: "tell me a joke",
},
},
}),
);
ws.on("message", function incoming(message) {
console.log(message.toString());
});

Workers Binding example

JSONC
{
"ai": {
"binding": "AI",
},
}
src/index.ts
type Env = {
AI: Ai;
};
export default {
async fetch(request: Request, env: Env) {
return env.AI.gateway("my-gateway").run({
provider: "workers-ai",
endpoint: "@cf/meta/llama-3.1-8b-instruct",
headers: {
authorization: "Bearer my-api-token",
},
query: {
prompt: "tell me a joke",
},
});
},
};

Header configuration hierarchy

The Universal Endpoint allows you to set fallback models or providers and customize headers for each provider or request. You can configure headers at three levels:

  1. Provider level: Headers specific to a particular provider.
  2. Request level: Headers included in individual requests.
  3. Gateway settings: Default headers configured in your gateway dashboard.

Since the same settings can be configured in multiple locations, AI Gateway applies a hierarchy to determine which configuration takes precedence:

  • Provider-level headers override all other configurations.
  • Request-level headers are used if no provider-level headers are set.
  • Gateway-level settings are used only if no headers are configured at the provider or request levels.

This hierarchy ensures consistent behavior, prioritizing the most specific configurations. Use provider-level and request-level headers for fine-tuned control, and gateway settings for general defaults.

Hierarchy example

This example demonstrates how headers set at different levels impact caching behavior:

  • Request-level header: The cf-aig-cache-ttl is set to 3600 seconds, applying this caching duration to the request by default.
  • Provider-level header: For the fallback provider (OpenAI), cf-aig-cache-ttl is explicitly set to 0 seconds, overriding the request-level header and disabling caching for responses when OpenAI is used as the provider.

This shows how provider-level headers take precedence over request-level headers, allowing for granular control of caching behavior.

Terminal window
curl https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id} \
--header 'Content-Type: application/json' \
--header 'cf-aig-cache-ttl: 3600' \
--data '[
{
"provider": "workers-ai",
"endpoint": "@cf/meta/llama-3.1-8b-instruct",
"headers": {
"Authorization": "Bearer {cloudflare_token}",
"Content-Type": "application/json"
},
"query": {
"messages": [
{
"role": "system",
"content": "You are a friendly assistant"
},
{
"role": "user",
"content": "What is Cloudflare?"
}
]
}
},
{
"provider": "openai",
"endpoint": "chat/completions",
"headers": {
"Authorization": "Bearer {open_ai_token}",
"Content-Type": "application/json",
"cf-aig-cache-ttl": "0"
},
"query": {
"model": "gpt-4o-mini",
"stream": true,
"messages": [
{
"role": "user",
"content": "What is Cloudflare?"
}
]
}
}
]'