---
title: WebSocket adapter
description: Stream audio and video between WebRTC tracks and WebSocket endpoints using Realtime SFU.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/realtime/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# WebSocket adapter

Note

WebSocket adapter is in beta. The API may change.

Stream audio and video between WebRTC tracks and WebSocket endpoints. Supports ingesting audio from WebSocket sources and sending WebRTC audio and video to WebSocket consumers. Video egress is supported as JPEG at approximately 1 FPS.

## What you can build

* AI services with WebSocket APIs for audio processing
* Custom audio processing pipelines
* Legacy system bridges
* Server-side audio generation and consumption
* Video snapshotting and thumbnails
* Computer vision ingestion (low FPS)

## How it works

* [ Ingest (WebSocket → WebRTC) ](#tab-panel-7631)
* [ Stream (WebRTC → WebSocket) ](#tab-panel-7632)

### Create WebRTC tracks from external audio

Ingest audio from external sources via WebSocket to create WebRTC tracks for distribution.

graph LR
    A[External System] -->|Audio Data| B[WebSocket Endpoint]
    B -->|Adapter| C[Realtime SFU]
    C -->|New Session| D[WebRTC Track]
    D -->|WebRTC| E[WebRTC Clients]

**Use cases:**

* AI text-to-speech generation streaming into WebRTC
* Audio from backend services or databases
* Live audio feeds from external systems

**Key characteristics:**

* Creates a new session ID automatically
* Uses `buffer` mode for chunked audio transmission
* Maximum 32 KB per WebSocket message

### Stream WebRTC audio and video to external systems

Stream audio and video from existing WebRTC tracks to external systems via WebSocket for processing or storage.

graph LR
    A[WebRTC Source] -->|WebRTC| B[Realtime SFU Session]
    B -->|Adapter| C[WebSocket Endpoint]
    C -->|Media Data| D[External System]

**Use cases:**

* Real-time speech-to-text transcription
* Audio recording and archival
* Live audio processing pipelines
* Video snapshotting and thumbnails
* Computer vision ingestion (low FPS)

**Key characteristics:**

* Requires existing session ID with track
* Audio: Sends individual PCM frames as they are produced; each includes timestamp and sequence number
* Video: Sends individual JPEG frames at approximately 1 FPS; each includes timestamp (sequence number may be unset)

## API reference

### Create adapter

```

POST /v1/apps/{appId}/adapters/websocket/new


```

* [ Ingest ](#tab-panel-7633)
* [ Stream ](#tab-panel-7634)

#### Request body

```

{

  "tracks": [

    {

      "location": "local",

      "trackName": "string",

      "endpoint": "wss://...",

      "inputCodec": "pcm",

      "mode": "buffer"

    }

  ]

}


```

#### Parameters

| Parameter  | Type   | Description                                                 |
| ---------- | ------ | ----------------------------------------------------------- |
| location   | string | **Required**. Must be "local" for ingesting audio           |
| trackName  | string | **Required**. Name for the new WebRTC track to create       |
| endpoint   | string | **Required**. WebSocket URL to receive audio from           |
| inputCodec | string | **Required**. Codec of incoming audio. Currently only "pcm" |
| mode       | string | **Required**. Must be "buffer" for local mode               |

#### Response

```

{

  "tracks": [

    {

      "trackName": "string",

      "adapterId": "string",

      "sessionId": "string",    // New session ID generated

      "endpoint": "string"      // Echo of the requested endpoint

    }

  ]

}


```

Important

* A new session ID is automatically generated.
* The `sessionId` field in the request is ignored if provided.
* Send audio in chunks up to 32 KB per WebSocket message.

#### Request body

```

{

  "tracks": [

    {

      "location": "remote",

      "sessionId": "string",

      "trackName": "string",

      "endpoint": "wss://...",

      "outputCodec": "pcm"

    }

  ]

}


```

#### Parameters

| Parameter   | Type   | Description                                                                                 |
| ----------- | ------ | ------------------------------------------------------------------------------------------- |
| location    | string | **Required**. Must be "remote" for streaming media out                                      |
| sessionId   | string | **Required**. Existing session ID containing the track                                      |
| trackName   | string | **Required**. Name of the existing track to stream                                          |
| endpoint    | string | **Required**. WebSocket URL to send media to                                                |
| outputCodec | string | **Required**. Codec for outgoing media. Use "pcm" for audio, "jpeg" for video (egress only) |

#### Response

```

{

  "tracks": [

    {

      "trackName": "string",

      "adapterId": "string",

      "sessionId": "string",    // Same as request sessionId

      "endpoint": "string"      // Echo of the requested endpoint

    }

  ]

}


```

Important

* Requires an existing session with the specified track.
* Audio frames are sent individually with timestamp and sequence number.
* Video frames are sent individually as JPEG at approximately 1 FPS with timestamp; sequence number may be unset.
* Each frame is a separate WebSocket message.
* No mode parameter; frames are sent as produced.

### Close adapter

```

POST /v1/apps/{appId}/adapters/websocket/close


```

#### Request body

```

{

  "tracks": [

    {

      "adapterId": "string"

    }

  ]

}


```

## Media formats

### WebRTC tracks

* **Codec**: Opus
* **Sample rate**: 48 kHz
* **Channels**: Stereo

### WebSocket binary format

Media uses Protocol Buffers. Audio uses PCM payloads; video uses JPEG payloads:

* 16-bit signed little-endian PCM
* 48 kHz sample rate
* Stereo (left/right interleaved)
* Video: JPEG image payload (one frame per message)

```

message Packet {

    uint32 sequenceNumber = 1;  // Used in Stream mode only

    uint32 timestamp = 2;       // Used in Stream mode only

    bytes payload = 5;          // Media data

}


```

**Ingest mode (buffer)**: Only the `payload` field is used, containing chunks of audio data.

**Stream mode (egress)**:

* For audio frames:  
   * `sequenceNumber`: Incremental packet counter  
   * `timestamp`: Timestamp for synchronization  
   * `payload`: Individual PCM audio frame data
* For video frames (JPEG):  
   * `timestamp`: Timestamp for synchronization  
   * `payload`: JPEG image data (one frame per message)  
   * Note: `sequenceNumber` may be unset for video frames

### Video (JPEG)

* Supported WebRTC input codecs: H264, H265, VP8, VP9
* Output over WebSocket: JPEG images at approximately 1 FPS

## Connection protocol

Connects to your WebSocket endpoint:

1. WebSocket upgrade handshake
2. Secure connection for `wss://` URLs
3. Media streaming begins

### Message format

#### Buffer mode (ingest)

* **Binary messages**: PCM audio data in chunks
* **Maximum message size**: 32 KB per WebSocket message
* **Important**: Account for serialization overhead when chunking audio buffers
* Send audio in small, frequent chunks rather than large batches

#### Stream mode (egress)

* **Binary messages**: Individual frames with metadata (audio or video)
* Audio frames include:  
   * Timestamp information  
   * Sequence number  
   * PCM audio frame data
* Video frames include:  
   * Timestamp information  
   * JPEG image data  
   * Note: Sequence number may be unset for video frames
* Frames are sent individually as they arrive from the WebRTC track
* Video frames are emitted at approximately 1 FPS

### Connection lifecycle

1. Connects to the WebSocket endpoint
2. Audio streaming begins
3. Video streaming begins (if configured)
4. Connection closes when closed or on error

## Pricing

Currently in beta and free to use.

Once generally available, billing will follow standard Cloudflare Realtime pricing at $0.05 per GB egress. Only traffic originating from Cloudflare towards WebSocket endpoints incurs charges. Traffic ingested from WebSocket endpoints into Cloudflare incurs no charge.

Usage counts towards your Cloudflare Realtime free tier of 1,000 GB.

## Best practices

### Connection management

* Closing an already-closed instance returns success
* Close when sessions end
* Implement reconnection logic for network failures

### Performance

* Deploy WebSocket endpoints close to Cloudflare edge
* Use appropriate buffer sizes
* Monitor connection quality

### Security

* Secure WebSocket endpoints with authentication
* Use `wss://` for production
* Implement rate limiting

## Limitations

* **WebSocket payloads**: PCM (audio) for ingest and stream; JPEG (video) for stream
* **Beta status**: API may change in future releases
* **Video support**: Egress only (JPEG)
* **Video frame rate**: Approximately 1 FPS (beta; not configurable)
* **Unidirectional flow**: Each instance handles one direction

## Error handling

| Error Code | Description                              |
| ---------- | ---------------------------------------- |
| 400        | Invalid request parameters               |
| 404        | Session or track not found               |
| 503        | Adapter not found (for close operations) |

## Reference implementations

* Audio (PCM over WebSocket): [Cloudflare Realtime Examples – ai-tts-stt ↗](https://github.com/cloudflare/realtime-examples/tree/main/ai-tts-stt)
* Video (JPEG egress): [Cloudflare Realtime Examples – video-to-jpeg ↗](https://github.com/cloudflare/realtime-examples/tree/main/video-to-jpeg)

## Migration from custom bridges

1. Replace custom signaling with adapter API calls
2. Update WebSocket endpoints to handle PCM format
3. Implement adapter lifecycle management
4. Remove custom STUN/TURN configuration

## FAQ

**Q: Can I use the same adapter for bidirectional audio?**A: No, each instance is unidirectional. Create separate adapters for send and receive.

**Q: What happens if the WebSocket connection drops?**A: The adapter closes and must be recreated. Implement reconnection logic in your app.

**Q: Is there a limit on concurrent adapters?**A: Limits follow standard Cloudflare Realtime quotas. Contact support for specific requirements.

**Q: Can I change the audio format after creating an adapter?**A: No, audio format is fixed at creation time. Create a new adapter for different formats.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/realtime/","name":"Realtime"}},{"@type":"ListItem","position":3,"item":{"@id":"/realtime/sfu/","name":"Realtime SFU"}},{"@type":"ListItem","position":4,"item":{"@id":"/realtime/sfu/media-transport-adapters/","name":"Media Transport Adapters"}},{"@type":"ListItem","position":5,"item":{"@id":"/realtime/sfu/media-transport-adapters/websocket-adapter/","name":"WebSocket adapter"}}]}
```