---
title: Build a Durable AI Agent
description: Build a durable AI agent using Cloudflare Workflows that researches GitHub repositories with automatic retries.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/workflows/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

# Build a Durable AI Agent

In this guide, you will build an AI agent that researches GitHub repositories. Give it a task like "Compare open-source LLM projects" and it will:

1. Search GitHub for relevant repositories
2. Fetch details about each one (stars, forks, activity)
3. Analyze and compare them
4. Return a recommendation

Each LLM call and tool call becomes a step — a self-contained, individually retryable unit of work. If any step fails, Workflows retries it automatically. If the entire Workflow crashes mid-task, it resumes from the last successful step.

| Challenge                    | Solution with Workflows                                 |
| ---------------------------- | ------------------------------------------------------- |
| Long-running agent loops     | Durable execution that survives any interruption        |
| Unreliable LLM and API calls | Automatic retry with independent checkpoints            |
| Waiting for human approval   | waitForEvent() pauses for hours or days                 |
| Polling for job completion   | step.sleep() between checks without consuming resources |

This guide uses the [Agents SDK](https://developers.cloudflare.com/agents/) with Workflows for real-time progress updates and the Anthropic SDK for LLM calls. The same patterns apply to any LLM SDK (OpenAI, Google AI, Mistral, etc.).

## Quick start

If you want to skip the steps and pull down the complete agent, utilizing [AI Gateway](https://developers.cloudflare.com/ai-gateway), run the following command:

Terminal window

```

npm create cloudflare@latest -- --template cloudflare/docs-examples/workflows/durableAgent


```

Use this option if you are familiar with Cloudflare Workflows or want to explore the code first.

Follow the steps below to learn how to build a durable AI agent from scratch.

## Prerequisites

1. Sign up for a [Cloudflare account ↗](https://dash.cloudflare.com/sign-up/workers-and-pages).
2. Install [Node.js ↗](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm).

Node.js version manager

Use a Node version manager like [Volta ↗](https://volta.sh/) or [nvm ↗](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions. [Wrangler](https://developers.cloudflare.com/workers/wrangler/install-and-update/), discussed later in this guide, requires a Node version of `16.17.0` or later.

You will also need an [Anthropic API key ↗](https://platform.claude.com/settings/keys) for LLM calls. New accounts include free credits.

## 1\. Create a new Worker project

1. Create a new Worker project by running the following command:  
 npm  yarn  pnpm  
```  
npm create cloudflare@latest -- durable-ai-agent  
```  
```  
yarn create cloudflare durable-ai-agent  
```  
```  
pnpm create cloudflare@latest durable-ai-agent  
```  
For setup, select the following options:  
   * For _What would you like to start with?_, choose `Hello World example`.  
   * For _Which template would you like to use?_, choose `Worker only`.  
   * For _Which language do you want to use?_, choose `TypeScript`.  
   * For _Do you want to use git for version control?_, choose `Yes`.  
   * For _Do you want to deploy your application?_, choose `No` (we will be making some changes before deploying).
2. Move into your project:  
Terminal window  
```  
cd durable-ai-agent  
```
3. Install dependencies:  
Terminal window  
```  
npm install agents @anthropic-ai/sdk  
```

## 2\. Define your tools

Tools are functions the LLM can call to interact with external systems. You define the schema (what inputs the tool accepts) and the implementation (what it does). The LLM decides when to use each tool based on the task.

1. Create `src/tools.ts` with two complementary tools:  
src/tools.ts  
```  
export interface SearchReposInput {  
  query: string;  
  limit?: number;  
}  
export interface GetRepoInput {  
  owner: string;  
  repo: string;  
}  
interface GitHubSearchResponse {  
  items: Array<{ full_name: string; stargazers_count: number }>;  
}  
interface GitHubRepoResponse {  
  full_name: string;  
  description: string;  
  stargazers_count: number;  
  forks_count: number;  
  open_issues_count: number;  
  language: string;  
  license: { name: string } | null;  
  updated_at: string;  
}  
export const searchReposTool = {  
  name: "search_repos" as const,  
  description:  
    "Search GitHub repositories by keyword. Returns top results. Use get_repo for details.",  
  input_schema: {  
    type: "object" as const,  
    properties: {  
      query: {  
        type: "string",  
        description: "Search query (e.g., 'typescript orm')",  
      },  
      limit: { type: "number", description: "Max results (default 5)" },  
    },  
    required: ["query"],  
  },  
  run: async (input: SearchReposInput): Promise<string> => {  
    const response = await fetch(  
      `https://api.github.com/search/repositories?q=${encodeURIComponent(input.query)}&sort=stars&per_page=${input.limit ?? 5}`,  
      {  
        headers: {  
          Accept: "application/vnd.github+json",  
          "User-Agent": "DurableAgent/1.0",  
        },  
      },  
    );  
    if (!response.ok) return `Search failed: ${response.status}`;  
    const data = await response.json<GitHubSearchResponse>();  
    return JSON.stringify(  
      data.items.map((r) => ({  
        name: r.full_name,  
        stars: r.stargazers_count,  
      })),  
    );  
  },  
};  
export const getRepoTool = {  
  name: "get_repo" as const,  
  description:  
    "Get detailed info about a GitHub repository including stars, forks, and description.",  
  input_schema: {  
    type: "object" as const,  
    properties: {  
      owner: {  
        type: "string",  
        description: "Repository owner (e.g., 'cloudflare')",  
      },  
      repo: {  
        type: "string",  
        description: "Repository name (e.g., 'workers-sdk')",  
      },  
    },  
    required: ["owner", "repo"],  
  },  
  run: async (input: GetRepoInput): Promise<string> => {  
    const response = await fetch(  
      `https://api.github.com/repos/${input.owner}/${input.repo}`,  
      {  
        headers: {  
          Accept: "application/vnd.github+json",  
          "User-Agent": "DurableAgent/1.0",  
        },  
      },  
    );  
    if (!response.ok) return `Repo not found: ${input.owner}/${input.repo}`;  
    const data = await response.json<GitHubRepoResponse>();  
    return JSON.stringify({  
      name: data.full_name,  
      description: data.description,  
      stars: data.stargazers_count,  
      forks: data.forks_count,  
      issues: data.open_issues_count,  
      language: data.language,  
      license: data.license?.name ?? "None",  
      updated: data.updated_at,  
    });  
  },  
};  
export const tools = [searchReposTool, getRepoTool];  
```

These tools complement each other: `search_repos` finds repositories, and `get_repo` fetches details about specific ones.

## 3\. Write your Workflow

The `AgentWorkflow` class from the Agents SDK extends Cloudflare Workflows with bidirectional Agent communication. Your Workflow can report progress, broadcast to WebSocket clients, and call Agent methods via RPC.

* The [step](https://developers.cloudflare.com/workflows/build/workers-api/#step) object provides methods to define durable steps.
* `step.do(name, callback)` executes code and persists the result. If the Workflow is interrupted, it resumes from the last successful step.
* `this.reportProgress()` sends progress updates to the Agent (non-durable).
* `this.broadcastToClients()` sends messages to all connected WebSocket clients (non-durable).

For a gentler introduction, refer to [Build your first Workflow](https://developers.cloudflare.com/workflows/get-started/guide/).

Create `src/workflow.ts`:

src/workflow.ts

```

import { AgentWorkflow } from "agents/workflows";

import type { AgentWorkflowEvent, AgentWorkflowStep } from "agents/workflows";

import Anthropic from "@anthropic-ai/sdk";

import {

  tools,

  searchReposTool,

  getRepoTool,

  type SearchReposInput,

  type GetRepoInput,

} from "./tools";

import type { ResearchAgent } from "./agent";


type Params = { task: string };


export class ResearchWorkflow extends AgentWorkflow<ResearchAgent, Params> {

  async run(event: AgentWorkflowEvent<Params>, step: AgentWorkflowStep) {

    const client = new Anthropic({ apiKey: this.env.ANTHROPIC_API_KEY });


    const messages: Anthropic.MessageParam[] = [

      { role: "user", content: event.payload.task },

    ];


    const toolDefinitions = tools.map(({ run, ...rest }) => rest);


    // Durable agent loop - each turn is checkpointed

    for (let turn = 0; turn < 10; turn++) {

      // Report progress to Agent and connected clients

      await this.reportProgress({

        step: `llm-turn-${turn}`,

        status: "running",

        percent: turn / 10,

        message: `Processing turn ${turn + 1}...`,

      });


      const response = (await step.do(

        `llm-turn-${turn}`,

        { retries: { limit: 3, delay: "10 seconds", backoff: "exponential" } },

        async () => {

          const msg = await client.messages.create({

            model: "claude-sonnet-4-5-20250929",

            max_tokens: 4096,

            tools: toolDefinitions,

            messages,

          });

          // Serialize for Workflow state

          return JSON.parse(JSON.stringify(msg));

        },

      )) as Anthropic.Message;


      if (!response || !response.content) continue;


      messages.push({ role: "assistant", content: response.content });


      if (response.stop_reason === "end_turn") {

        const textBlock = response.content.find(

          (b): b is Anthropic.TextBlock => b.type === "text",

        );

        const result = {

          status: "complete",

          turns: turn + 1,

          result: textBlock?.text ?? null,

        };


        // Report completion (durable)

        await step.reportComplete(result);

        return result;

      }


      const toolResults: Anthropic.ToolResultBlockParam[] = [];


      for (const block of response.content) {

        if (block.type !== "tool_use") continue;


        // Broadcast tool execution to clients

        this.broadcastToClients({

          type: "tool_call",

          tool: block.name,

          turn,

        });


        const result = await step.do(

          `tool-${turn}-${block.id}`,

          { retries: { limit: 2, delay: "5 seconds" } },

          async () => {

            switch (block.name) {

              case "search_repos":

                return searchReposTool.run(block.input as SearchReposInput);

              case "get_repo":

                return getRepoTool.run(block.input as GetRepoInput);

              default:

                return `Unknown tool: ${block.name}`;

            }

          },

        );


        toolResults.push({

          type: "tool_result",

          tool_use_id: block.id,

          content: result,

        });

      }


      messages.push({ role: "user", content: toolResults });

    }


    return { status: "max_turns_reached", turns: 10 };

  }

}


```

Why separate steps for LLM and tools?

Each `step.do()` creates a checkpoint. If your Workflow crashes or the Worker restarts:

* **After LLM step**: The response is persisted. On resume, it skips the LLM call and moves to tool execution.
* **After tool step**: The result is persisted. If a later tool fails, earlier tools do not re-run.

This is especially important for:

* **LLM calls**: Expensive and slow, should not repeat unnecessarily
* **External APIs**: May have rate limits or side effects
* **Idempotency**: Some tools (like sending emails) should not run twice

## 4\. Write your Agent

The Agent handles HTTP requests, WebSocket connections, and Workflow lifecycle events. It triggers a workflow instance `runWorkflow()` and receives progress updates via callbacks.

Create `src/agent.ts`:

src/agent.ts

```

import { Agent } from "agents";


type State = {

  currentWorkflow?: string;

  status?: string;

};


export class ResearchAgent extends Agent<Env, State> {

  initialState: State = {};


  // Start a research task - called via HTTP or WebSocket

  async startResearch(task: string) {

    const instanceId = await this.runWorkflow("RESEARCH_WORKFLOW", { task });

    this.setState({

      ...this.state,

      currentWorkflow: instanceId,

      status: "running",

    });

    return { instanceId };

  }


  // Get status of a workflow

  async getResearchStatus(instanceId: string) {

    return this.getWorkflow(instanceId);

  }


  // Called when workflow reports progress

  async onWorkflowProgress(

    workflowName: string,

    instanceId: string,

    progress: unknown,

  ) {

    // Broadcast to all connected WebSocket clients

    this.broadcast(JSON.stringify({ type: "progress", instanceId, progress }));

  }


  // Called when workflow completes

  async onWorkflowComplete(

    workflowName: string,

    instanceId: string,

    result?: unknown,

  ) {

    this.setState({ ...this.state, status: "complete" });

    this.broadcast(JSON.stringify({ type: "complete", instanceId, result }));

  }


  // Called when workflow errors

  async onWorkflowError(

    workflowName: string,

    instanceId: string,

    error: string,

  ) {

    this.setState({ ...this.state, status: "error" });

    this.broadcast(JSON.stringify({ type: "error", instanceId, error }));

  }

}


```

## 5\. Configure your project

1. Open `wrangler.jsonc` and add the Agent and Workflow configuration:  
   * [  wrangler.jsonc ](#tab-panel-10640)  
   * [  wrangler.toml ](#tab-panel-10641)  
JSONC  
```  
{  
  "$schema": "node_modules/wrangler/config-schema.json",  
  "name": "durable-ai-agent",  
  "main": "src/index.ts",  
  // Set this to today's date  
  "compatibility_date": "2026-05-08",  
  "observability": {  
    "enabled": true  
  },  
  "durable_objects": {  
    "bindings": [  
      {  
        "name": "ResearchAgent",  
        "class_name": "ResearchAgent"  
      }  
    ]  
  },  
  "workflows": [  
    {  
      "name": "research-workflow",  
      "binding": "RESEARCH_WORKFLOW",  
      "class_name": "ResearchWorkflow"  
    }  
  ],  
  "migrations": [  
    {  
      "tag": "v1",  
      "new_sqlite_classes": ["ResearchAgent"]  
    }  
  ]  
}  
```  
TOML  
```  
"$schema" = "node_modules/wrangler/config-schema.json"  
name = "durable-ai-agent"  
main = "src/index.ts"  
# Set this to today's date  
compatibility_date = "2026-05-08"  
[observability]  
enabled = true  
[[durable_objects.bindings]]  
name = "ResearchAgent"  
class_name = "ResearchAgent"  
[[workflows]]  
name = "research-workflow"  
binding = "RESEARCH_WORKFLOW"  
class_name = "ResearchWorkflow"  
[[migrations]]  
tag = "v1"  
new_sqlite_classes = [ "ResearchAgent" ]  
```
2. Generate types for your bindings:  
Terminal window  
```  
npx wrangler types  
```  
This creates a `worker-configuration.d.ts` file with the `Env` type that includes your bindings.

## 6\. Write your API

The Worker routes requests to the Agent, which manages workflow lifecycle. Use `routeAgentRequest()` for WebSocket connections and `getAgentByName()` for server-side RPC calls.

Replace `src/index.ts`:

src/index.ts

```

import { getAgentByName, routeAgentRequest } from "agents";


export { ResearchAgent } from "./agent";

export { ResearchWorkflow } from "./workflow";


export default {

  async fetch(request: Request, env: Env): Promise<Response> {

    const url = new URL(request.url);


    // Route WebSocket connections to /agents/research-agent/{name}

    const agentResponse = await routeAgentRequest(request, env);

    if (agentResponse) return agentResponse;


    // HTTP API for starting research tasks

    if (request.method === "POST" && url.pathname === "/research") {

      const { task, agentId } = await request.json<{

        task: string;

        agentId?: string;

      }>();


      // Get agent instance by name (creates if doesn't exist)

      const agent = await getAgentByName(

        env.ResearchAgent,

        agentId ?? "default",

      );


      // Start the research workflow via RPC

      const result = await agent.startResearch(task);

      return Response.json(result);

    }


    // Check workflow status

    if (url.pathname === "/status") {

      const instanceId = url.searchParams.get("instanceId");

      const agentId = url.searchParams.get("agentId") ?? "default";


      if (!instanceId) {

        return Response.json({ error: "instanceId required" }, { status: 400 });

      }


      const agent = await getAgentByName(env.ResearchAgent, agentId);

      const status = await agent.getResearchStatus(instanceId);


      return Response.json(status);

    }


    return new Response("POST /research with { task } to start", {

      status: 400,

    });

  },

} satisfies ExportedHandler<Env>;


```

## 7\. Develop locally

1. Create a [.env file](https://developers.cloudflare.com/workers/wrangler/environments/#secrets-in-local-development) for local development:  
.env  
```  
ANTHROPIC_API_KEY=your-api-key-here  
```
2. Start the dev server:  
Terminal window  
```  
npx wrangler dev  
```
3. Start a research task:  
Terminal window  
```  
curl -X POST http://localhost:8787/research \  
  -H "Content-Type: application/json" \  
  -d '{"task": "Compare open-source LLM projects"}'  
```  
```  
{ "instanceId": "abc-123-def" }  
```
4. Check progress (may take a few seconds to complete):  
Terminal window  
```  
curl "http://localhost:8787/status?instanceId=abc-123-def"  
```

The agent will search for repositories, fetch details, and return a comparison. Progress updates are broadcast to any connected WebSocket clients.

## 8\. Deploy

1. Deploy the Worker:  
Terminal window  
```  
npx wrangler deploy  
```
2. Add your API key as a secret:  
Terminal window  
```  
npx wrangler secret put ANTHROPIC_API_KEY  
```
3. Start a research task on your deployed Worker:  
Terminal window  
```  
curl -X POST https://durable-ai-agent.<your-subdomain>.workers.dev/research \  
  -H "Content-Type: application/json" \  
  -d '{"task": "Compare open-source LLM projects"}'  
```
4. Inspect workflow runs with the CLI:  
Terminal window  
```  
npx wrangler workflows instances describe research-workflow latest  
```  
This shows every step the agent took, including LLM calls, tool executions, timing, and any retries.  
You can also view this in the Cloudflare dashboard under **research-workflow**.  
[ Go to **Workflows** ](https://dash.cloudflare.com/?to=/:account/workers/workflows)

## Real-time client integration

Connect to your Agent via WebSocket to receive real-time progress updates. The `useAgent` hook connects to `/agents/{agent-name}/{instance-name}`:

```

/agents/research-agent/default  → ResearchAgent instance "default"

/agents/research-agent/user-123 → ResearchAgent instance "user-123"


```

```

import { useState } from "react";

import { useAgent } from "agents/react";


function ResearchUI({ agentId = "default" }) {

  const [progress, setProgress] = useState(null);


  const { state } = useAgent({

    agent: "research-agent", // Maps to ResearchAgent class

    name: agentId, // Instance name

    onMessage: (message) => {

      const data = JSON.parse(message.data);

      if (data.type === "progress") {

        setProgress(data.progress);

      }

    },

  });


  return (

    <div>

      {progress && (

        <p>

          {progress.message} ({Math.round(progress.percent * 100)}%)

        </p>

      )}

    </div>

  );

}


```

Agent class names are automatically converted to kebab-case for URLs (`ResearchAgent` → `research-agent`).

## Learn more

[ Agents SDK Workflows ](https://developers.cloudflare.com/agents/api-reference/run-workflows/) Complete API reference for AgentWorkflow, lifecycle callbacks, and bidirectional communication. 

[ Events and parameters ](https://developers.cloudflare.com/workflows/build/events-and-parameters/) Pass data to Workflows and pause for external events with waitForEvent. 

[ Sleeping and retrying ](https://developers.cloudflare.com/workflows/build/sleeping-and-retrying/) Configure retry behavior and sleep patterns. 

[ Workers API ](https://developers.cloudflare.com/workflows/build/workers-api/) Explore the full Workflows API for programmatic control. 

[ Agents SDK ](https://developers.cloudflare.com/agents/) For interactive agents with real-time chat and WebSocket connections. 

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workflows/","name":"Workflows"}},{"@type":"ListItem","position":3,"item":{"@id":"/workflows/get-started/","name":"Get started"}},{"@type":"ListItem","position":4,"item":{"@id":"/workflows/get-started/durable-agents/","name":"Build a Durable AI Agent"}}]}
```