Back to Blog
2 min read

AI Safety Guardrails: Implementing Content Filtering in Azure OpenAI

Implementing robust AI safety measures is essential for production deployments. Azure OpenAI provides built-in content filtering, but enterprise applications often need additional custom guardrails.

Understanding Azure OpenAI Content Filtering

Azure OpenAI’s content filtering system operates on four categories: hate, sexual, violence, and self-harm. Each category can be configured with different severity thresholds:

from openai import AzureOpenAI
from azure.ai.contentsafety import ContentSafetyClient
from azure.ai.contentsafety.models import TextCategory, AnalyzeTextOptions
from azure.core.credentials import AzureKeyCredential

class ContentSafetyGuard:
    def __init__(self, openai_client: AzureOpenAI, safety_endpoint: str, safety_key: str):
        self.openai_client = openai_client
        self.safety_client = ContentSafetyClient(
            endpoint=safety_endpoint,
            credential=AzureKeyCredential(safety_key)
        )
        self.severity_thresholds = {
            TextCategory.HATE: 2,
            TextCategory.SEXUAL: 2,
            TextCategory.VIOLENCE: 4,
            TextCategory.SELF_HARM: 2
        }

    def analyze_content(self, text: str) -> dict:
        """Analyze text for safety concerns."""

        request = AnalyzeTextOptions(text=text)
        response = self.safety_client.analyze_text(request)

        violations = []
        for category_result in response.categories_analysis:
            if category_result.severity >= self.severity_thresholds.get(
                category_result.category, 2
            ):
                violations.append({
                    "category": category_result.category.value,
                    "severity": category_result.severity
                })

        return {
            "is_safe": len(violations) == 0,
            "violations": violations,
            "analysis": response
        }

Implementing Custom Business Rules

Add domain-specific safety rules beyond default content filtering:

import re
from typing import List, Callable

class BusinessRuleGuard:
    def __init__(self):
        self.rules: List[Callable[[str], tuple[bool, str]]] = []

    def add_pii_detection(self):
        """Block potential PII in prompts."""
        patterns = {
            "ssn": r"\b\d{3}-\d{2}-\d{4}\b",
            "credit_card": r"\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b",
            "email": r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b"
        }

        def check_pii(text: str) -> tuple[bool, str]:
            for pii_type, pattern in patterns.items():
                if re.search(pattern, text):
                    return False, f"Potential {pii_type} detected"
            return True, ""

        self.rules.append(check_pii)

    def add_prompt_injection_detection(self):
        """Detect common prompt injection patterns."""
        injection_patterns = [
            r"ignore\s+(previous|above|all)\s+instructions",
            r"disregard\s+(your|the)\s+(rules|guidelines)",
            r"you\s+are\s+now\s+(a|an)\s+"
        ]

        def check_injection(text: str) -> tuple[bool, str]:
            text_lower = text.lower()
            for pattern in injection_patterns:
                if re.search(pattern, text_lower):
                    return False, "Potential prompt injection detected"
            return True, ""

        self.rules.append(check_injection)

Logging and Audit Trail

Maintain comprehensive logs of all content filtering decisions for compliance and incident investigation. Every blocked request should be logged with context while respecting privacy requirements.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.