Skip to content
Back to Blog
2 min read

Azure Purview (Preview): Unified Data Governance

Microsoft announced Azure Purview at Ignite this week and the timing could not be better. Every consulting engagement I’ve been on this year has the same recurring question: “where does this data live, and can we trust it?” Purview’s pitch is unified governance across on-prem, multi-cloud, and SaaS—discover, catalog, classify, and lineage in one service. It’s still public preview, so the rough edges are real, but it’s worth getting hands on early.

Note: Azure Purview is in public preview. Some features may change before GA.

Core Capabilities

  • Data Map: Automated discovery and cataloging
  • Data Catalog: Search and browse data assets
  • Data Estate Insights: Governance analytics
  • Data Lineage: Track data flow
  • Classifications: Sensitive data detection

Creating a Purview Account

az purview account create \
    --name mypurview \
    --resource-group myRG \
    --location eastus \
    --managed-group-name mypurview-managed

Registering Data Sources

Scan Azure, on-premises, and multi-cloud sources:

Azure Data Lake Storage

# Register source
az purview datasource create \
    --account-name mypurview \
    --name adls-source \
    --kind AzureDataLakeStore \
    --data-lake-store-name mystorageaccount

Azure SQL Database

az purview datasource create \
    --account-name mypurview \
    --name sql-source \
    --kind AzureSqlDatabase \
    --server-endpoint myserver.database.windows.net

Running Scans

{
    "name": "weekly-scan",
    "kind": "AzureDataLakeStorageMsi",
    "properties": {
        "scanRulesetName": "AzureDataLakeStorage",
        "scanRulesetType": "System",
        "collection": {
            "referenceName": "mypurview",
            "type": "CollectionReference"
        },
        "credential": {
            "referenceName": "managed-identity",
            "credentialType": "ManagedIdentity"
        }
    },
    "scanTrigger": {
        "recurrence": {
            "frequency": "Week",
            "interval": 1,
            "startTime": "2020-10-31T00:00:00Z"
        }
    }
}

Classifications

Built-in sensitive data patterns:

ClassificationExample
Credit Card Number4111-1111-1111-1111
Social Security Number123-45-6789
Email Addressuser@example.com
IP Address192.168.1.1
Person NameJohn Smith

Custom Classifications

{
    "name": "EmployeeId",
    "kind": "RegexClassificationRule",
    "properties": {
        "classificationName": "CUSTOM.EmployeeId",
        "pattern": "EMP[0-9]{6}",
        "minimumPercentageMatch": 60
    }
}

Data Lineage

Visualize data flow:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ SQL Server   │────▶│ Data Factory │────▶│ Data Lake    │
│ (Source)     │     │ (Transform)  │     │ (Destination)│
└──────────────┘     └──────────────┘     └──────────────┘
                            │
                            ▼
                     ┌──────────────┐
                     │ Synapse      │
                     │ Analytics    │
                     └──────────────┘

REST API

from azure.identity import DefaultAzureCredential
from azure.purview.catalog import PurviewCatalogClient

credential = DefaultAzureCredential()
client = PurviewCatalogClient(
    endpoint="https://mypurview.purview.azure.com",
    credential=credential
)

# Search for assets
results = client.discovery.query(
    search_request={
        "keywords": "customer",
        "filter": {
            "and": [
                {"classification": "MICROSOFT.PERSONAL.EMAIL"},
                {"assetType": "Azure SQL Table"}
            ]
        }
    }
)

for result in results["value"]:
    print(f"Asset: {result['qualifiedName']}")
    print(f"Classifications: {result.get('classification', [])}")

Glossary Terms

Define business vocabulary:

{
    "name": "Customer",
    "longDescription": "An individual or organization that purchases products or services",
    "abbreviation": "CUST",
    "status": "Approved",
    "resources": [
        {"displayName": "Data Policy", "url": "https://..."}
    ],
    "experts": [
        {"id": "user@company.com"}
    ]
}

Purview: know your data, govern your data.\n\n## Takeaways\n\nAdd a concise, personal takeaway and recommended next steps here.\n

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.