3 min read
Azure Purview (Preview): Unified Data Governance
Just announced at Microsoft Ignite 2020, Azure Purview is now available in public preview. This new service provides unified data governance across on-premises, multi-cloud, and SaaS data. Discover, catalog, classify, and govern your entire data estate.
Note: Azure Purview is in public preview. Some features may change before GA.
Core Capabilities
- Data Map: Automated discovery and cataloging
- Data Catalog: Search and browse data assets
- Data Estate Insights: Governance analytics
- Data Lineage: Track data flow
- Classifications: Sensitive data detection
Creating a Purview Account
az purview account create \
--name mypurview \
--resource-group myRG \
--location eastus \
--managed-group-name mypurview-managed
Registering Data Sources
Scan Azure, on-premises, and multi-cloud sources:
Azure Data Lake Storage
# Register source
az purview datasource create \
--account-name mypurview \
--name adls-source \
--kind AzureDataLakeStore \
--data-lake-store-name mystorageaccount
Azure SQL Database
az purview datasource create \
--account-name mypurview \
--name sql-source \
--kind AzureSqlDatabase \
--server-endpoint myserver.database.windows.net
Running Scans
{
"name": "weekly-scan",
"kind": "AzureDataLakeStorageMsi",
"properties": {
"scanRulesetName": "AzureDataLakeStorage",
"scanRulesetType": "System",
"collection": {
"referenceName": "mypurview",
"type": "CollectionReference"
},
"credential": {
"referenceName": "managed-identity",
"credentialType": "ManagedIdentity"
}
},
"scanTrigger": {
"recurrence": {
"frequency": "Week",
"interval": 1,
"startTime": "2020-10-31T00:00:00Z"
}
}
}
Classifications
Built-in sensitive data patterns:
| Classification | Example |
|---|---|
| Credit Card Number | 4111-1111-1111-1111 |
| Social Security Number | 123-45-6789 |
| Email Address | user@example.com |
| IP Address | 192.168.1.1 |
| Person Name | John Smith |
Custom Classifications
{
"name": "EmployeeId",
"kind": "RegexClassificationRule",
"properties": {
"classificationName": "CUSTOM.EmployeeId",
"pattern": "EMP[0-9]{6}",
"minimumPercentageMatch": 60
}
}
Data Lineage
Visualize data flow:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ SQL Server │────▶│ Data Factory │────▶│ Data Lake │
│ (Source) │ │ (Transform) │ │ (Destination)│
└──────────────┘ └──────────────┘ └──────────────┘
│
▼
┌──────────────┐
│ Synapse │
│ Analytics │
└──────────────┘
REST API
from azure.identity import DefaultAzureCredential
from azure.purview.catalog import PurviewCatalogClient
credential = DefaultAzureCredential()
client = PurviewCatalogClient(
endpoint="https://mypurview.purview.azure.com",
credential=credential
)
# Search for assets
results = client.discovery.query(
search_request={
"keywords": "customer",
"filter": {
"and": [
{"classification": "MICROSOFT.PERSONAL.EMAIL"},
{"assetType": "Azure SQL Table"}
]
}
}
)
for result in results["value"]:
print(f"Asset: {result['qualifiedName']}")
print(f"Classifications: {result.get('classification', [])}")
Glossary Terms
Define business vocabulary:
{
"name": "Customer",
"longDescription": "An individual or organization that purchases products or services",
"abbreviation": "CUST",
"status": "Approved",
"resources": [
{"displayName": "Data Policy", "url": "https://..."}
],
"experts": [
{"id": "user@company.com"}
]
}
Purview: know your data, govern your data.