Skip to content
Cloudflare Docs

PII detection

AI Security for Apps (formerly Firewall for AI) can detect personally identifiable information (PII) in incoming LLM prompts. There are two approaches to PII detection, and you can use them together for layered protection:

  • Fuzzy detection (AI-powered) — AI Security for Apps uses an AI model to identify common PII types in the prompt content. This approach catches PII even when it appears in natural language or unexpected formats.
  • Exact detection (regex) — You write a WAF custom rule with a regular expression on the raw request body. This approach is ideal for organization-specific identifiers with a known, predictable format.

Fuzzy PII detection

When AI Security for Apps is enabled and a request arrives at a cf-llm labeled endpoint, it scans the prompt for PII and populates two fields:

The detection is based on Presidio, a data protection and de-identification SDK. Refer to the cf.llm.prompt.pii_categories field reference for the full list of recognized categories.

Supported PII categories

CategoryDescription
CREDIT_CARDCredit card number
CRYPTOCryptocurrency wallet address
DATE_TIMEDate or time expression
EMAIL_ADDRESSEmail address
IBAN_CODEInternational bank account number
IP_ADDRESSIP address
NRPNationality, religious, or political group
LOCATIONPhysical location or address
PERSONPerson name
PHONE_NUMBERPhone number
MEDICAL_LICENSEMedical license number
URLURL
US_BANK_NUMBERUS bank account number
US_DRIVER_LICENSEUS driver license number
US_ITINUS Individual Taxpayer Identification Number
US_PASSPORTUS passport number
US_SSNUS Social Security Number
UK_NHSUK National Health Service number
UK_NINOUK National Insurance Number
ES_NIFSpanish tax identification number
ES_NIESpanish foreigner identification number
IT_FISCAL_CODEItalian fiscal code
IT_DRIVER_LICENSEItalian driver license
IT_VAT_CODEItalian VAT code
IT_PASSPORTItalian passport number
IT_IDENTITY_CARDItalian identity card
PL_PESELPolish national identification number
SG_NRIC_FINSingapore National Registration Identity Card / Foreign Identification Number
SG_UENSingapore Unique Entity Number
AU_ABNAustralian Business Number
AU_ACNAustralian Company Number
AU_TFNAustralian Tax File Number
AU_MEDICAREAustralian Medicare number
IN_PANIndian Permanent Account Number
IN_AADHAARIndian Aadhaar number
IN_VEHICLE_REGISTRATIONIndian vehicle registration number
IN_VOTERIndian voter ID
IN_PASSPORTIndian passport number
FI_PERSONAL_IDENTITY_CODEFinnish personal identity code

Be specific to reduce false positives

The cf.llm.prompt.pii_detected field returns true when any PII category is detected — including broad categories like PERSON, DATE_TIME, and LOCATION that frequently appear in normal conversation. Blocking based on this field alone will produce a high false-positive rate for most applications.

Instead, build rules against cf.llm.prompt.pii_categories and list only the categories that matter for your use case. For example, a customer support chatbot may need to block credit card numbers and SSNs but can safely ignore person names and dates. Start with the narrowest set of categories, monitor matches in Security Analytics, and expand only as needed.

Example rules — fuzzy detection

Block any request containing PII

  • When incoming requests match:

    FieldOperatorValue
    LLM PII DetectedequalsTrue

    Expression when using the editor:
    (cf.llm.prompt.pii_detected)

  • Action: Block

Block only specific PII categories

  • When incoming requests match:

    FieldOperatorValue
    LLM PII Categoriesis inCredit Card

    Expression when using the editor:
    (any(cf.llm.prompt.pii_categories[*] in {"CREDIT_CARD"}))

  • Action: Block

Log email addresses but block credit cards and SSNs

Create two custom rules:

  1. A rule with action Block and the following expression:
    (any(cf.llm.prompt.pii_categories[*] in {"CREDIT_CARD" "US_SSN"}))

  2. A rule with action Log and the following expression:
    (any(cf.llm.prompt.pii_categories[*] in {"EMAIL_ADDRESS"}))

Exact PII detection (regex)

If you need to detect custom PII formats specific to your organization — such as internal employee IDs, patient record numbers, or proprietary account identifiers — you can create a WAF custom rule using a regex match on the raw body (http.request.body.raw field).

This approach complements fuzzy detection by covering formats the AI model does not natively recognize.

Example: Detect employee IDs

In the following example, an organization uses employee IDs in the format EMP- followed by exactly six digits (for example, EMP-482910).

Create a custom rule with the following configuration:

  • When incoming requests match:

    FieldOperatorValue
    Raw request bodymatches regexEMP-[0-9]{6}

    Expression when using the editor:
    (http.request.body.raw matches "EMP-[0-9]{6}")

  • Action: Block

  • With response type: Custom JSON

  • Response body: { "error": "Request blocked: employee ID detected in prompt." }

Scope to a specific endpoint

To limit this rule to only your LLM endpoint, combine it with a path condition:

FieldOperatorValueLogic
URI Pathequals/api/chatAnd
Raw request bodymatches regexEMP-[0-9]{6}

Expression when using the editor:
(http.request.uri.path eq "/api/chat" and http.request.body.raw matches "EMP-[0-9]{6}")

More regex examples

Custom PII typeExample formatRegex pattern
Employee IDEMP-482910EMP-[0-9]{6}
Patient record numberPAT/2024/00391PAT/[0-9]{4}/[0-9]{5}
Internal account IDACCT-XX-99999ACCT-[A-Z]{2}-[0-9]{5}
Custom API key prefixsk_live_abc123...sk_live_[a-zA-Z0-9]{20,}

Considerations for regex rules

  • Cloudflare Plan requirement. Regex operators (matches and ~) require a Business or Enterprise plan.
  • Body size limit. The http.request.body.raw field inspects a limited portion of the request body. The exact limit varies by plan.
  • JSON payloads. The raw body includes the full JSON structure. Your regex should account for the fact that the prompt text is nested inside a JSON string.
  • Performance. Complex regex patterns can impact rule evaluation time. Keep patterns as specific as possible.

Combine both approaches

You can use fuzzy and exact detection together for layered protection:

(cf.llm.prompt.pii_detected or http.request.body.raw matches "EMP-[0-9]{6}")

This rule blocks requests where either the AI model detects any built-in PII category or the regex matches your custom identifier format.