---
title: Using BigQuery with Workers AI
description: Learn how to ingest data stored outside of Cloudflare as an input to Workers AI models.
image: https://developers.cloudflare.com/dev-products-preview.png
---

> Documentation Index  
> Fetch the complete documentation index at: https://developers.cloudflare.com/workers-ai/llms.txt  
> Use this file to discover all available pages before exploring further.

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI)[ JavaScript ](https://developers.cloudflare.com/search/?tags=JavaScript) 

# Using BigQuery with Workers AI

**Last reviewed:**  over 1 year ago 

The easiest way to get started with [Workers AI](https://developers.cloudflare.com/workers-ai/) is to try it out in the [Multi-modal Playground ↗](https://multi-modal.ai.cloudflare.com/) and the [LLM playground ↗](https://playground.ai.cloudflare.com/). If you decide that you want to integrate your code with Workers AI, you may then decide to use its [REST API endpoints](https://developers.cloudflare.com/workers-ai/get-started/rest-api/) or a [Worker binding](https://developers.cloudflare.com/workers-ai/configuration/bindings/).

But what about the data? What if you want these models to ingest data that is stored outside Cloudflare?

In this tutorial, you will learn how to bring data from Google BigQuery to a Cloudflare Worker so that it can be used as input for Workers AI models.

## Prerequisites

You will need:

* A [Cloudflare Worker](https://developers.cloudflare.com/workers/) project running a [Hello World script](https://developers.cloudflare.com/workers/get-started/guide/).
* A Google Cloud Platform [service account ↗](https://cloud.google.com/iam/docs/service-accounts-create#iam-service-accounts-create-console) with an [associated key ↗](https://cloud.google.com/iam/docs/keys-create-delete#iam-service-account-keys-create-console) file downloaded that has read access to BigQuery.
* Access to a BigQuery table with some test data that allows you to create a [BigQuery Job Query ↗](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query). For this tutorial it is recommended you that you create your own table as [sampled tables ↗](https://cloud.google.com/bigquery/public-data#sample%5Ftables), unless cloned to your own GCP namespace, won't allow you to run job queries against them. For this example, the [Hacker News Corpus ↗](https://www.kaggle.com/datasets/hacker-news/hacker-news-corpus) was used under its MIT licence.

## 1\. Set up your Cloudflare Worker

To ingest the data into Cloudflare and feed it into Workers AI, you will be using a [Cloudflare Worker](https://developers.cloudflare.com/workers/). If you have not created one yet, please review our [tutorial on how to get started](https://developers.cloudflare.com/workers/get-started/).

After following the steps to create a Worker, you should have the following code in your new Worker project:

JavaScript

```

export default {

  async fetch(request, env, ctx) {

    return new Response("Hello World!");

  },

};


```

If the Worker project has successfully been created, you should also be able to run `npx wrangler dev` in a console to run the Worker locally:

Terminal window

```

[wrangler:inf] Ready on http://localhost:8787


```

Open a browser tab at `http://localhost:8787/` to see your deployed Worker. Please note that the port `8787` may be a different one in your case.

You should be seeing `Hello World!` in your browser:

Terminal window

```

Hello World!


```

If you run into any issues during this step, please review the [Worker's Get Started Guide](https://developers.cloudflare.com/workers/get-started/guide/).

## 2\. Import GCP Service key into the Worker as Secrets

Now that you have verified that the Worker has been created successfully, you will need to reference the Google Cloud Platform service key created in the [Prerequisites](#prerequisites) section of this tutorial.

Your downloaded key JSON file from Google Cloud Platform should have the following format:

```

{

  "type": "service_account",

  "project_id": "<your_project_id>",

  "private_key_id": "<your_private_key_id>",

  "private_key": "<your_private_key>",

  "client_email": "<your_service_account_id>@<your_project_id>.iam.gserviceaccount.com",

  "client_id": "<your_oauth2_client_id>",

  "auth_uri": "https://accounts.google.com/o/oauth2/auth",

  "token_uri": "https://oauth2.googleapis.com/token",

  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",

  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/<your_service_account_id>%40<your_project_id>.iam.gserviceaccount.com",

  "universe_domain": "googleapis.com"

}


```

For this tutorial, you will only need the values of the following fields: `client_email`, `private_key`, `private_key_id`, and `project_id`.

Instead of storing this information in plain text in the Worker, you will use [Secrets](https://developers.cloudflare.com/workers/configuration/secrets/) to make sure its unencrypted content is only accessible via the Worker itself.

Import those three values from the JSON file into Secrets, starting with the field from the JSON key file called `client_email`, which we will now call `BQ_CLIENT_EMAIL` (you can use another variable name):

Terminal window

```

npx wrangler secret put BQ_CLIENT_EMAIL


```

You will be asked to enter a secret value, which will be the value of the field `client_email` in the JSON key file.

Note

Do not include any double quotes in the secret that you store, as it will already be interpreted as a string.

If the secret was uploaded successfully, the following message will be displayed:

Terminal window

```

✨ Success! Uploaded secret BQ_CLIENT_EMAIL


```

Now import the secrets for the three remaining fields; `private_key`, `private_key_id`, and `project_id` as `BQ_PRIVATE_KEY`, `BQ_PRIVATE_KEY_ID`, and `BQ_PROJECT_ID` respectively:

Terminal window

```

npx wrangler secret put BQ_PRIVATE_KEY


```

Terminal window

```

npx wrangler secret put BQ_PRIVATE_KEY_ID


```

Terminal window

```

npx wrangler secret put BQ_PROJECT_ID


```

At this point, you have successfully imported three fields from the JSON key file downloaded from Google Cloud Platform into Cloudflare Secrets to be used in a Worker.

[Secrets](https://developers.cloudflare.com/workers/configuration/secrets/) are only made available to Workers once they are deployed. To make them available during development, [create a .dev.vars](https://developers.cloudflare.com/workers/configuration/secrets/#local-development-with-secrets) file to locally store these credentials and reference them as environment variables.

Your `dev.vars` file should look like the following:

```

BQ_CLIENT_EMAIL="<your_service_account_id>@<your_project_id>.iam.gserviceaccount.com"

BQ_CLIENT_KEY="-----BEGIN PRIVATE KEY-----<content_of_your_private_key>-----END PRIVATE KEY-----\n"

BQ_PRIVATE_KEY_ID="<your_private_key_id>"

BQ_PROJECT_ID="<your_project_id>"


```

Make sure to include `.dev.vars` in your project `.gitignore` file to prevent your credentials being uploaded to a repository when using version control.

Check the secrets are loaded correctly in `src/index.js` by logging their values into a console output, as follows:

JavaScript

```

export default {

  async fetch(request, env, ctx) {

    console.log("BQ_CLIENT_EMAIL: ", env.BQ_CLIENT_EMAIL);

    console.log("BQ_PRIVATE_KEY: ", env.BQ_PRIVATE_KEY);

    console.log("BQ_PRIVATE_KEY_ID: ", env.BQ_PRIVATE_KEY_ID);

    console.log("BQ_PROJECT_ID: ", env.BQ_PROJECT_ID);

    return new Response("Hello World!");

  },

};


```

Restart the Worker and run `npx wrangler dev`. You should see that the server now mentions the newly added variables:

```

Using vars defined in .dev.vars

Your worker has access to the following bindings:

- Vars:

  - BQ_CLIENT_EMAIL: "(hidden)"

  - BQ_PRIVATE_KEY: "(hidden)"

  - BQ_PRIVATE_KEY_ID: "(hidden)"

  - BQ_PROJECT_ID: "(hidden)"

[wrangler:inf] Ready on http://localhost:8787


```

If you open `http://localhost:8787` in your browser, you should see the values of the variables show up in your console where the `npx wrangler dev` command is running, while still seeing only the `Hello World!` text in the browser window.

You now have access to the GCP credentials from a Worker. Next, you will install a library to help with the creation of the JSON Web Token needed to interact with GCP's API.

## 3\. Install library to handle JWT operations

To interact with BigQuery's REST API, you will need to generate a [JSON Web Token ↗](https://jwt.io/introduction) to authenticate your requests using the credentials that you have loaded into Worker secrets in the previous step.

For this tutorial, you will be using the [jose ↗](https://www.npmjs.com/package/jose?activeTab=readme) library for JWT-related operations. Install it by running the following command in a console:

Terminal window

```

npm i jose


```

To verify that the installation succeeded, you can run `npm list`, which lists all the installed packages, to check if the `jose` dependency has been added:

Terminal window

```

<project_name>@0.0.0

/<path_to_your_project>/<project_name>

├── @cloudflare/vitest-pool-workers@0.4.29

├── jose@5.9.2

├── vitest@1.5.0

└── wrangler@3.75.0


```

## 4\. Generate JSON web token

Now that you have installed the `jose` library, it is time to import it and add a function to your code that generates a signed JSON Web Token (JWT):

JavaScript

```

import * as jose from 'jose';

...

const generateBQJWT = async (aCryptoKey, env) => {

const algorithm = "RS256";

const audience = "https://bigquery.googleapis.com/";

const expiryAt = (new Date().valueOf() / 1000);

  const privateKey = await jose.importPKCS8(env.BQ_PRIVATE_KEY, algorithm);


  // Generate signed JSON Web Token (JWT)

  return new jose.SignJWT()

      .setProtectedHeader({

          typ: 'JWT',

          alg: algorithm,

          kid: env.BQ_PRIVATE_KEY_ID

      })

      .setIssuer(env.BQ_CLIENT_EMAIL)

      .setSubject(env.BQ_CLIENT_EMAIL)

      .setAudience(audience)

      .setExpirationTime(expiryAt)

      .setIssuedAt()

      .sign(privateKey)

}


export default {

  async fetch(request, env, ctx) {

       ...

// Create JWT to authenticate the BigQuery API call

      let bqJWT;

      try {

          bqJWT = await generateBQJWT(env);

      } catch (e) {

          return new Response('An error has occurred while generating the JWT', { status: 500 })

      }

  },

       ...

};


```

Now that you have created a JWT, it is time to do an API call to BigQuery to fetch some data.

## 5\. Make authenticated requests to Google BigQuery

With the JWT token created in the previous step, issue an API request to BigQuery's API to retrieve data from a table.

You will now query the table that you created in BigQuery earlier in this tutorial. This example uses a sampled version of the [Hacker News Corpus ↗](https://www.kaggle.com/datasets/hacker-news/hacker-news-corpus) that was used under its MIT licence and uploaded to BigQuery.

JavaScript

```

const queryBQ = async (bqJWT, path) => {

  const bqEndpoint = `https://bigquery.googleapis.com${path}`

  // In this example, text is a field in the BigQuery table that is being queried (hn.news_sampled)

  const query = 'SELECT text FROM hn.news_sampled LIMIT 3';

  const response = await fetch(bqEndpoint, {

      method: "POST",

      body: JSON.stringify({

          "query": query

      }),

      headers: {

          Authorization: `Bearer ${bqJWT}`

      }

  })

  return response.json()

}

...

export default {

  async fetch(request, env, ctx) {

    ...

        let ticketInfo;

        try {

        ticketInfo = await queryBQ(bqJWT);

      } catch (e) {

          return new Response('An error has occurred while querying BQ', { status: 500 });

      }

  ...

  },

};


```

Having the raw row data from BigQuery means that you can now format it in a JSON-like style next.

## 6\. Format results from the query

Now that you have retrieved the data from BigQuery, your BigQuery API response should look something like this:

```

{

  ...

  "schema": {

      "fields": [

          {

              "name": "title",

              "type": "STRING",

              "mode": "NULLABLE"

          },

          {

              "name": "text",

              "type": "STRING",

              "mode": "NULLABLE"

          }

      ]

  },

  ...

  "rows": [

      {

          "f": [

              {

                  "v": "<some_value>"

              },

              {

                  "v": "<some_value>"

              }

          ]

      },

      {

          "f": [

              {

                  "v": "<some_value>"

              },

              {

                  "v": "<some_value>"

              }

          ]

      },

      {

          "f": [

              {

                  "v": "<some_value>"

              },

              {

                  "v": "<some_value>"

              }

          ]

      }

  ],

  ...

}


```

This format may be difficult to read and work with when iterating through results. So you will now implement a function that maps the schema into each individual value, and the resulting output will be easier to read, as shown below. Each row corresponds to an object within an array.

JavaScript

```

[

  {

    title: "<some_value>",

    text: "<some_value>",

  },

  {

    title: "<some_value>",

    text: "<some_value>",

  },

  {

    title: "<some_value>",

    text: "<some_value>",

  },

];


```

Create a `formatRows` function that takes a number of rows and fields returned from the BigQuery response body and returns an array of results as objects with named fields.

JavaScript

```

const formatRows = (rowsWithoutFieldNames, fields) => {

  // Index to fieldName

  const fieldsByIndex = new Map();


  // Load all fields by name and have their index in the array result as their key

  fields.forEach((field, index) => {

      fieldsByIndex.set(index, field.name)

  })


  // Iterate through rows

  const rowsWithFieldNames = rowsWithoutFieldNames.map(row => {

      // Per each row represented by an array f, iterate through the unnamed values and find their field names by searching them in the fieldsByIndex.

      let newRow = {}

      row.f.forEach((field, index) => {

          const fieldName = fieldsByIndex.get(index);

          if (fieldName) {

    // For every field in a row, add them to newRow

              newRow = ({ ...newRow, [fieldName]: field.v });

          }

      })

      return newRow

  })


  return rowsWithFieldNames

}


export default {

  async fetch(request, env, ctx) {

    ...

      // Transform output format into array of objects with named fields

      let formattedResults;


      if ('rows' in ticketInfo) {

          formattedResults = formatRows(ticketInfo.rows, ticketInfo.schema.fields);

          console.log(formattedResults)

      } else if ('error' in ticketInfo) {

          return new Response(ticketInfo.error.message, { status: 500 })

      }

  ...

  },

};


```

## 7\. Feed data into Workers AI

Now that you have converted the response from the BigQuery API into an array of results, generate some tags and attach an associated sentiment score using an LLM via [Workers AI](https://developers.cloudflare.com/workers-ai/):

JavaScript

```

const generateTags = (data, env) => {

  return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {

      prompt: `Create three one-word tags for the following text. return only these three tags separated by a comma. don't return text that is not a category.Lowercase only. ${JSON.stringify(data)}`,

  });

}


const generateSentimentScore = (data, env) => {

  return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {

      prompt: `return a float number between 0 and 1 measuring the sentiment of the following text. 0 being negative and 1 positive. return only the number, no text. ${JSON.stringify(data)}`,

  });

}


// Iterates through values, sends them to an AI handler and encapsulates all responses into a single Promise

const getAIGeneratedContent = (data, env, aiHandler) => {

  let results = data?.map(dataPoint => {

      return aiHandler(dataPoint, env)

  })

  return Promise.all(results)

}

...

export default {

  async fetch(request, env, ctx) {

    ...

let summaries, sentimentScores;

      try {

          summaries = await getAIGeneratedContent(formattedResults, env, generateTags);

          sentimentScores = await getAIGeneratedContent(formattedResults, env, generateSentimentScore)

      } catch {

          return new Response('There was an error while generating the text summaries or sentiment scores')

      }

},


formattedResults = formattedResults?.map((formattedResult, i) => {

          if (sentimentScores[i].response && summaries[i].response) {

              return {

                  ...formattedResult,

                  'sentiment': parseFloat(sentimentScores[i].response).toFixed(2),

                  'tags': summaries[i].response.split(',').map((result) => result.trim())

              }

          }

      }

};


```

Uncomment the following lines from the Wrangler file in your project:

* [  wrangler.jsonc ](#tab-panel-8958)
* [  wrangler.toml ](#tab-panel-8959)

JSONC

```

{

  "ai": {

    "binding": "AI"

  }

}


```

TOML

```

[ai]

binding = "AI"


```

Restart the Worker that is running locally, and after doing so, go to your application endpoint:

Terminal window

```

curl http://localhost:8787


```

It is likely that you will be asked to log in to your Cloudflare account and grant temporary access to Wrangler (the Cloudflare CLI) to use your account when using Worker AI.

Once you access `http://localhost:8787` you should see an output similar to the following:

Terminal window

```

{

  "data": [

  {

    "text": "You can see a clear spike in submissions right around US Thanksgiving.",

    "sentiment": "0.61",

    "tags": [

      "trends",

      "submissions",

      "thanksgiving"

    ]

  },

  {

    "text": "I didn't test the changes before I published them.  I basically did development on the running server. In fact for about 30 seconds the comments page was broken due to a bug.",

    "sentiment": "0.35",

    "tags": [

      "software",

      "deployment",

      "error"

    ]

  },

  {

    "text": "I second that. As I recall, it's a very enjoyable 700-page brain dump by someone who's really into his subject. The writing has a personal voice; there are lots of asides, dry wit, and typos that suggest restrained editing. The discussion is intelligent and often theoretical (and Bartle is not scared to use mathematical metaphors), but the tone is not academic.",

    "sentiment": "0.86",

    "tags": [

      "review",

      "game",

      "design"

    ]

  }

  ]

}


```

The actual values and fields will mostly depend on the query made in Step 5 that is then fed into the LLM.

## Final result

All the code shown in the different steps is combined into the following code in `src/index.js`:

JavaScript

```

import * as jose from "jose";


const generateBQJWT = async (env) => {

  const algorithm = "RS256";

  const audience = "https://bigquery.googleapis.com/";

  const expiryAt = new Date().valueOf() / 1000;

  const privateKey = await jose.importPKCS8(env.BQ_PRIVATE_KEY, algorithm);


  // Generate signed JSON Web Token (JWT)

  return new jose.SignJWT()

    .setProtectedHeader({

      typ: "JWT",

      alg: algorithm,

      kid: env.BQ_PRIVATE_KEY_ID,

    })

    .setIssuer(env.BQ_CLIENT_EMAIL)

    .setSubject(env.BQ_CLIENT_EMAIL)

    .setAudience(audience)

    .setExpirationTime(expiryAt)

    .setIssuedAt()

    .sign(privateKey);

};


const queryBQ = async (bgJWT, path) => {

  const bqEndpoint = `https://bigquery.googleapis.com${path}`;

  const query = "SELECT text FROM hn.news_sampled LIMIT 3";

  const response = await fetch(bqEndpoint, {

    method: "POST",

    body: JSON.stringify({

      query: query,

    }),

    headers: {

      Authorization: `Bearer ${bgJWT}`,

    },

  });

  return response.json();

};


const formatRows = (rowsWithoutFieldNames, fields) => {

  // Index to fieldName

  const fieldsByIndex = new Map();


  fields.forEach((field, index) => {

    fieldsByIndex.set(index, field.name);

  });


  const rowsWithFieldNames = rowsWithoutFieldNames.map((row) => {

    // Map rows into an array of objects with field names

    let newRow = {};

    row.f.forEach((field, index) => {

      const fieldName = fieldsByIndex.get(index);

      if (fieldName) {

        newRow = { ...newRow, [fieldName]: field.v };

      }

    });

    return newRow;

  });


  return rowsWithFieldNames;

};


const generateTags = (data, env) => {

  return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {

    prompt: `Create three one-word tags for the following text. return only these three tags separated by a comma. don't return text that is not a category.Lowercase only. ${JSON.stringify(data)}`,

  });

};


const generateSentimentScore = (data, env) => {

  return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {

    prompt: `return a float number between 0 and 1 measuring the sentiment of the following text. 0 being negative and 1 positive. return only the number, no text. ${JSON.stringify(data)}`,

  });

};


const getAIGeneratedContent = (data, env, aiHandler) => {

  let results = data?.map((dataPoint) => {

    return aiHandler(dataPoint, env);

  });

  return Promise.all(results);

};


export default {

  async fetch(request, env, ctx) {

    // Create JWT to authenticate the BigQuery API call

    let bqJWT;

    try {

      bqJWT = await generateBQJWT(env);

    } catch (error) {

      console.log(error);

      return new Response("An error has occurred while generating the JWT", {

        status: 500,

      });

    }


    // Fetch results from BigQuery

    let ticketInfo;

    try {

      ticketInfo = await queryBQ(

        bqJWT,

        `/bigquery/v2/projects/${env.BQ_PROJECT_ID}/queries`,

      );

    } catch (error) {

      console.log(error);

      return new Response("An error has occurred while querying BQ", {

        status: 500,

      });

    }


    // Transform output format into array of objects with named fields

    let formattedResults;

    if ("rows" in ticketInfo) {

      formattedResults = formatRows(ticketInfo.rows, ticketInfo.schema.fields);

    } else if ("error" in ticketInfo) {

      return new Response(ticketInfo.error.message, { status: 500 });

    }


    // Generate AI summaries and sentiment scores

    let summaries, sentimentScores;

    try {

      summaries = await getAIGeneratedContent(

        formattedResults,

        env,

        generateTags,

      );

      sentimentScores = await getAIGeneratedContent(

        formattedResults,

        env,

        generateSentimentScore,

      );

    } catch {

      return new Response(

        "There was an error while generating the text summaries or sentiment scores",

      );

    }


    // Add AI summaries and sentiment scores to previous results

    formattedResults = formattedResults?.map((formattedResult, i) => {

      if (sentimentScores[i].response && summaries[i].response) {

        return {

          ...formattedResult,

          sentiment: parseFloat(sentimentScores[i].response).toFixed(2),

          tags: summaries[i].response.split(",").map((result) => result.trim()),

        };

      }

    });


    const response = { data: formattedResults };


    return new Response(JSON.stringify(response), {

      headers: { "Content-Type": "application/json" },

    });

  },

};


```

If you wish to deploy this Worker, you can do so by running `npx wrangler deploy`:

Terminal window

```

Total Upload: <size_of_your_worker> KiB / gzip: <compressed_size_of_your_worker> KiB

Uploaded <name_of_your_worker> (x sec)

Deployed <name_of_your_worker> triggers (x sec)

  https://<your_public_worker_endpoint>

Current Version ID: <worker_script_version_id>


```

This will create a public endpoint that you can use to access the Worker globally. Please keep this in mind when using production data, and make sure to include additional access controls in place.

## Conclusion

In this tutorial, you have learnt how to integrate Google BigQuery and Cloudflare Workers by creating a GCP service account key and storing part of it as Worker secrets. This was later imported in the code, and by using the `jose` npm library, you created a JSON Web Token to authenticate the API query to BigQuery.

Once you obtained the results, you formatted them to pass to generative AI models via Workers AI to generate tags and to perform sentiment analysis on the extracted data.

## Next Steps

If, instead of displaying the results of ingesting the data to the AI model in a browser, your workflow requires fetching and store data (for example in [R2](https://developers.cloudflare.com/r2/) or [D1](https://developers.cloudflare.com/d1/)) on regular intervals, you may want to consider adding a [scheduled handler](https://developers.cloudflare.com/workers/runtime-apis/handlers/scheduled/) for this Worker. This enables you to trigger the Worker with a predefined cadence via a [Cron Trigger](https://developers.cloudflare.com/workers/configuration/cron-triggers/). Consider reviewing the Reference Architecture Diagrams on [Ingesting BigQuery Data into Workers AI](https://developers.cloudflare.com/reference-architecture/diagrams/ai/bigquery-workers-ai/).

A use case to ingest data from other sources, like you did in this tutorial, is to create a RAG system. If this sounds relevant to you, please check out the [Build a Retrieval Augmented Generation (RAG) AI tutorial](https://developers.cloudflare.com/workers-ai/guides/tutorials/build-a-retrieval-augmented-generation-ai/).

To learn more about what other AI models you can use at Cloudflare, please visit the [Workers AI](https://developers.cloudflare.com/workers-ai) section of our docs.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/tutorials/","name":"Tutorials"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/guides/tutorials/using-bigquery-with-workers-ai/","name":"Using BigQuery with Workers AI"}}]}
```
