Skip to content

Workers AI Models

We found 78 models
📌
Moonshot AI logokimi-k2.6
Text GenerationMoonshot AIHosted

Kimi K2.6 is a frontier-scale open-source 1T parameter model with a 262.1k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.

  • Function calling
  • Reasoning
  • Vision
📌
Zhipu AI logoglm-4.7-flash
Text GenerationZhipu AIHosted

GLM-4.7-Flash is a fast and efficient multilingual text generation model with a 131,072 token context window. Optimized for dialogue, instruction-following, and multi-turn tool calling across 100+ languages.

  • Function calling
  • Reasoning
📌
OpenAI logogpt-oss-120b
Text GenerationOpenAIHosted

OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-120b is for production, general purpose, high reasoning use-cases.

  • Function calling
  • Reasoning
📌
Meta logollama-4-scout-17b-16e-instruct
Text GenerationMetaHosted

Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

  • Batch
  • Function calling
  • Vision
Google logogemma-4-26b-a4b-it
Text GenerationGoogleHosted

Gemma 4 is Google's most intelligent family of open models, built from Gemini 3 research to maximize intelligence-per-parameter.

  • Function calling
  • Reasoning
  • Vision
NVIDIA logonemotron-3-120b-a12b
Text GenerationNVIDIAHosted

NVIDIA Nemotron 3 Super is a hybrid MoE model with leading accuracy for multi-agent applications and specialized agentic AI systems.

  • Function calling
  • Reasoning
Moonshot AI logokimi-k2.5
Text GenerationMoonshot AIHosted

Kimi K2.5 is a frontier-scale open-source model with a 256k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.

  • Function calling
  • Planned deprecation
  • Reasoning
  • Vision
Black Forest Labs logoflux-2-klein-9b
Text-to-ImageBlack Forest LabsHosted

FLUX.2 [klein] 9B is an ultra-fast, distilled image model with enhanced quality. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.

  • Partner
Black Forest Labs logoflux-2-klein-4b
Text-to-ImageBlack Forest LabsHosted

FLUX.2 [klein] is an ultra-fast, distilled image model. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.

  • Partner
Black Forest Labs logoflux-2-dev
Text-to-ImageBlack Forest LabsHosted

FLUX.2 [dev] is an image model from Black Forest Labs where you can generate highly realistic and detailed images, with multi-reference support.

  • Partner
Deepgram logoaura-2-es
Text-to-SpeechDeepgramHosted

Aura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

  • Batch
  • Partner
  • Real-time
Deepgram logoaura-2-en
Text-to-SpeechDeepgramHosted

Aura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

  • Batch
  • Partner
  • Real-time
IBM logogranite-4.0-h-micro
Text GenerationIBMHosted

Granite 4.0 instruct models deliver strong performance across benchmarks, achieving industry-leading results in key agentic tasks like instruction following and function calling. These efficiencies make the models well-suited for a wide range of use cases like retrieval-augmented generation (RAG), multi-agent workflows, and edge deployments.

  • Function calling
Deepgram logoflux
Automatic Speech RecognitionDeepgramHosted

Flux is the first conversational speech recognition model built specifically for voice agents.

  • Partner
  • Real-time
p
plamo-embedding-1b
Text EmbeddingspfnetHosted

PLaMo-Embedding-1B is a Japanese text embedding model developed by Preferred Networks, Inc. It can convert Japanese text input into numerical vectors and can be used for a wide range of applications, including information retrieval, text classification, and clustering.

    a
    gemma-sea-lion-v4-27b-it
    Text GenerationaisingaporeHosted

    SEA-LION stands for Southeast Asian Languages In One Network, which is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned for the Southeast Asia (SEA) region.

      a
      indictrans2-en-indic-1B
      Translationai4bharatHosted

      IndicTrans2 is the first open-source transformer-based multilingual NMT model that supports high-quality translations across all the 22 scheduled Indic languages

        Google logoembeddinggemma-300m
        Text EmbeddingsGoogleHosted

        EmbeddingGemma is a 300M parameter, state-of-the-art for its size, open embedding model from Google, built from Gemma 3 (with T5Gemma initialization) and the same research and technology used to create Gemini models. EmbeddingGemma produces vector representations of text, making it well-suited for search and retrieval tasks, including classification, clustering, and semantic similarity search. This model was trained with data in 100+ spoken languages.

          Deepgram logoaura-1
          Text-to-SpeechDeepgramHosted

          Aura is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

          • Batch
          • Partner
          • Real-time
          Leonardo logolucid-origin
          Text-to-ImageLeonardoHosted

          Lucid Origin from Leonardo.AI is their most adaptable and prompt-responsive model to date. Whether you're generating images with sharp graphic design, stunning full-HD renders, or highly specific creative direction, it adheres closely to your prompts, renders text with accuracy, and supports a wide array of visual styles and aesthetics – from stylized concept art to crisp product mockups.

          • Partner
          Leonardo logophoenix-1.0
          Text-to-ImageLeonardoHosted

          Phoenix 1.0 is a model by Leonardo.Ai that generates images with exceptional prompt adherence and coherent text.

          • Partner
          OpenAI logogpt-oss-20b
          Text GenerationOpenAIHosted

          OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-20b is for lower latency, and local or specialized use-cases.

          • Function calling
          • Reasoning
          Pipecat logosmart-turn-v2
          Voice Activity DetectionPipecatHosted

          An open source, community-driven, native audio turn detection model in 2nd version

          • Batch
          • Real-time
          Qwen logoqwen3-embedding-0.6b
          Text EmbeddingsQwenHosted

          The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks.

            Deepgram logonova-3
            Automatic Speech RecognitionDeepgramHosted

            Transcribe audio using Deepgram’s speech-to-text model

            • Batch
            • Partner
            • Real-time
            Qwen logoqwen3-30b-a3b-fp8
            Text GenerationQwenHosted

            Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

            • Batch
            • Function calling
            • Reasoning
            Google logogemma-3-12b-it
            Text GenerationGoogleHosted

            Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Gemma 3 models are multimodal, handling text and image input and generating text output, with a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

            • LoRA
            • Planned deprecation
            MistralAI logomistral-small-3.1-24b-instruct
            Text GenerationMistralAIHosted

            Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.

            • Function calling
            Qwen logoqwq-32b
            Text GenerationQwenHosted

            QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

            • LoRA
            • Reasoning
            Qwen logoqwen2.5-coder-32b-instruct
            Text GenerationQwenHosted

            Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

            • LoRA
            BAAI logobge-reranker-base
            Text ClassificationBAAIHosted

            Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.

              Meta logollama-guard-3-8b
              Text GenerationMetaHosted

              Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

              • LoRA
              DeepSeek logodeepseek-r1-distill-qwen-32b
              Text GenerationDeepSeekHosted

              DeepSeek-R1-Distill-Qwen-32B is a model distilled from DeepSeek-R1 based on Qwen2.5. It outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

              • Reasoning
              Meta logollama-3.3-70b-instruct-fp8-fast
              Text GenerationMetaHosted

              Llama 3.3 70B quantized to fp8 precision, optimized to be faster.

              • Batch
              • Function calling
              Meta logollama-3.2-1b-instruct
              Text GenerationMetaHosted

              The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.

                Meta logollama-3.2-3b-instruct
                Text GenerationMetaHosted

                The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.

                  Meta logollama-3.2-11b-vision-instruct
                  Text GenerationMetaHosted

                  The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.

                  • LoRA
                  • Vision
                  Black Forest Labs logoflux-1-schnell
                  Text-to-ImageBlack Forest LabsHosted

                  FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.

                    Meta logollama-3.1-8b-instruct-awq
                    Text GenerationMetaHosted

                    Quantized (int4) generative text model with 8 billion parameters from Meta.

                    • Planned deprecation
                    Meta logollama-3.1-8b-instruct-fp8
                    Text GenerationMetaHosted

                    Llama 3.1 8B quantized to FP8 precision

                      MyShell logomelotts
                      Text-to-SpeechMyShellHosted

                      MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai.

                        Meta logollama-3.1-8b-instruct
                        Text GenerationMetaHosted

                        The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

                        • Planned deprecation
                        BAAI logobge-m3
                        Text EmbeddingsBAAIHosted

                        Multi-Functionality, Multi-Linguality, and Multi-Granularity embeddings model.

                          Meta logometa-llama-3-8b-instruct
                          Text GenerationMetaHosted

                          Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.

                          • Planned deprecation
                          OpenAI logowhisper-large-v3-turbo
                          Automatic Speech RecognitionOpenAIHosted

                          Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation.

                          • Batch
                          Meta logollama-3-8b-instruct-awq
                          Text GenerationMetaHosted

                          Quantized (int4) generative text model with 8 billion parameters from Meta.

                          • Planned deprecation
                          l
                          llava-1.5-7b-hfBeta
                          Image-to-Textllava-hfHosted

                          LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

                            OpenAI logowhisper-tiny-enBeta
                            Automatic Speech RecognitionOpenAIHosted

                            Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. This is the English-only version of the Whisper Tiny model which was trained on the task of speech recognition.

                              Meta logollama-3-8b-instruct
                              Text GenerationMetaHosted

                              Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.

                              • Planned deprecation
                              MistralAI logomistral-7b-instruct-v0.2Beta
                              Text GenerationMistralAIHosted

                              The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2. Mistral-7B-v0.2 has the following changes compared to Mistral-7B-v0.1: 32k context window (vs 8k context in v0.1), rope-theta = 1e6, and no Sliding-Window Attention.

                              • LoRA
                              • Planned deprecation
                              Google logogemma-7b-it-loraBeta
                              Text GenerationGoogleHosted

                              This is a Gemma-7B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

                              • LoRA
                              Google logogemma-2b-it-loraBeta
                              Text GenerationGoogleHosted

                              This is a Gemma-2B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

                              • LoRA
                              Meta logollama-2-7b-chat-hf-loraBeta
                              Text GenerationMetaHosted

                              This is a Llama2 base model that Cloudflare dedicated for inference with LoRA adapters. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.

                              • LoRA
                              Google logogemma-7b-itBeta
                              Text GenerationGoogleHosted

                              Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants.

                              • LoRA
                              • Planned deprecation
                              n
                              hermes-2-pro-mistral-7bBeta
                              Text GenerationnousresearchHosted

                              Hermes 2 Pro on Mistral 7B is the new flagship 7B Hermes! Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

                              • Function calling
                              • Planned deprecation
                              MistralAI logomistral-7b-instruct-v0.2-loraBeta
                              Text GenerationMistralAIHosted

                              The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.

                              • LoRA
                              Unum logouform-gen2-qwen-500mBeta
                              Image-to-TextUnumHosted

                              UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.

                              • Planned deprecation
                              Meta logobart-large-cnnBeta
                              SummarizationMetaHosted

                              BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. You can use this model for text summarization.

                              • Planned deprecation
                              Microsoft logophi-2Beta
                              Text GenerationMicrosoftHosted

                              Phi-2 is a Transformer-based model with a next-word prediction objective, trained on 1.4T tokens from multiple passes on a mixture of Synthetic and Web datasets for NLP and coding.

                              • Planned deprecation
                              Defog logosqlcoder-7b-2Beta
                              Text GenerationDefogHosted

                              This model is intended to be used by non-technical users to understand data inside their SQL databases.

                              • Planned deprecation
                              Meta logodetr-resnet-50Beta
                              Object DetectionMetaHosted

                              DEtection TRansformer (DETR) model trained end-to-end on COCO 2017 object detection (118k annotated images).

                                ByteDance logostable-diffusion-xl-lightningBeta
                                Text-to-ImageByteDanceHosted

                                SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps.

                                  l
                                  dreamshaper-8-lcm
                                  Text-to-ImagelykonHosted

                                  Stable Diffusion model that has been fine-tuned to be better at photorealism without sacrificing range.

                                    RunwayML logostable-diffusion-v1-5-img2imgBeta
                                    Text-to-ImageRunwayMLHosted

                                    Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images. Img2img generate a new image from an input image with Stable Diffusion.

                                      RunwayML logostable-diffusion-v1-5-inpaintingBeta
                                      Text-to-ImageRunwayMLHosted

                                      Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.

                                        Stability.ai logostable-diffusion-xl-base-1.0Beta
                                        Text-to-ImageStability.aiHosted

                                        Diffusion-based text-to-image generative model by Stability AI. Generates and modify images based on text prompts.

                                          BAAI logobge-large-en-v1.5
                                          Text EmbeddingsBAAIHosted

                                          BAAI general embedding (Large) model that transforms any given text into a 1024-dimensional vector

                                          • Batch
                                          BAAI logobge-small-en-v1.5
                                          Text EmbeddingsBAAIHosted

                                          BAAI general embedding (Small) model that transforms any given text into a 384-dimensional vector

                                          • Batch
                                          Meta logollama-2-7b-chat-fp16
                                          Text GenerationMetaHosted

                                          Full precision (fp16) generative text model with 7 billion parameters from Meta

                                          • Planned deprecation
                                          MistralAI logomistral-7b-instruct-v0.1
                                          Text GenerationMistralAIHosted

                                          Instruct fine-tuned version of the Mistral-7b generative text model with 7 billion parameters

                                          • LoRA
                                          • Planned deprecation
                                          BAAI logobge-base-en-v1.5
                                          Text EmbeddingsBAAIHosted

                                          BAAI general embedding (Base) model that transforms any given text into a 768-dimensional vector

                                          • Batch
                                          HuggingFace logodistilbert-sst-2-int8
                                          Text ClassificationHuggingFaceHosted

                                          Distilled BERT model that was finetuned on SST-2 for sentiment classification

                                            Meta logollama-2-7b-chat-int8
                                            Text GenerationMetaHosted

                                            Quantized (int8) generative text model with 7 billion parameters from Meta

                                            • Planned deprecation
                                            Meta logom2m100-1.2b
                                            TranslationMetaHosted

                                            Multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation

                                            • Batch
                                            Microsoft logoresnet-50
                                            Image ClassificationMicrosoftHosted

                                            50 layers deep image classification CNN trained on more than 1M images from ImageNet

                                              OpenAI logowhisper
                                              Automatic Speech RecognitionOpenAIHosted

                                              Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

                                                Meta logollama-3.1-70b-instruct
                                                Text GenerationMetaHosted

                                                The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

                                                • Planned deprecation
                                                Meta logollama-3.1-8b-instruct-fast
                                                Text GenerationMetaHosted

                                                [Fast version] The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.