Back to Blog
3 min read

Azure Purview (Preview): Unified Data Governance

Just announced at Microsoft Ignite 2020, Azure Purview is now available in public preview. This new service provides unified data governance across on-premises, multi-cloud, and SaaS data. Discover, catalog, classify, and govern your entire data estate.

Note: Azure Purview is in public preview. Some features may change before GA.

Core Capabilities

  • Data Map: Automated discovery and cataloging
  • Data Catalog: Search and browse data assets
  • Data Estate Insights: Governance analytics
  • Data Lineage: Track data flow
  • Classifications: Sensitive data detection

Creating a Purview Account

az purview account create \
    --name mypurview \
    --resource-group myRG \
    --location eastus \
    --managed-group-name mypurview-managed

Registering Data Sources

Scan Azure, on-premises, and multi-cloud sources:

Azure Data Lake Storage

# Register source
az purview datasource create \
    --account-name mypurview \
    --name adls-source \
    --kind AzureDataLakeStore \
    --data-lake-store-name mystorageaccount

Azure SQL Database

az purview datasource create \
    --account-name mypurview \
    --name sql-source \
    --kind AzureSqlDatabase \
    --server-endpoint myserver.database.windows.net

Running Scans

{
    "name": "weekly-scan",
    "kind": "AzureDataLakeStorageMsi",
    "properties": {
        "scanRulesetName": "AzureDataLakeStorage",
        "scanRulesetType": "System",
        "collection": {
            "referenceName": "mypurview",
            "type": "CollectionReference"
        },
        "credential": {
            "referenceName": "managed-identity",
            "credentialType": "ManagedIdentity"
        }
    },
    "scanTrigger": {
        "recurrence": {
            "frequency": "Week",
            "interval": 1,
            "startTime": "2020-10-31T00:00:00Z"
        }
    }
}

Classifications

Built-in sensitive data patterns:

ClassificationExample
Credit Card Number4111-1111-1111-1111
Social Security Number123-45-6789
Email Addressuser@example.com
IP Address192.168.1.1
Person NameJohn Smith

Custom Classifications

{
    "name": "EmployeeId",
    "kind": "RegexClassificationRule",
    "properties": {
        "classificationName": "CUSTOM.EmployeeId",
        "pattern": "EMP[0-9]{6}",
        "minimumPercentageMatch": 60
    }
}

Data Lineage

Visualize data flow:

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ SQL Server   │────▶│ Data Factory │────▶│ Data Lake    │
│ (Source)     │     │ (Transform)  │     │ (Destination)│
└──────────────┘     └──────────────┘     └──────────────┘


                     ┌──────────────┐
                     │ Synapse      │
                     │ Analytics    │
                     └──────────────┘

REST API

from azure.identity import DefaultAzureCredential
from azure.purview.catalog import PurviewCatalogClient

credential = DefaultAzureCredential()
client = PurviewCatalogClient(
    endpoint="https://mypurview.purview.azure.com",
    credential=credential
)

# Search for assets
results = client.discovery.query(
    search_request={
        "keywords": "customer",
        "filter": {
            "and": [
                {"classification": "MICROSOFT.PERSONAL.EMAIL"},
                {"assetType": "Azure SQL Table"}
            ]
        }
    }
)

for result in results["value"]:
    print(f"Asset: {result['qualifiedName']}")
    print(f"Classifications: {result.get('classification', [])}")

Glossary Terms

Define business vocabulary:

{
    "name": "Customer",
    "longDescription": "An individual or organization that purchases products or services",
    "abbreviation": "CUST",
    "status": "Approved",
    "resources": [
        {"displayName": "Data Policy", "url": "https://..."}
    ],
    "experts": [
        {"id": "user@company.com"}
    ]
}

Purview: know your data, govern your data.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.