5 min read
ADX Continuous Export for Data Archival and Compliance
ADX Continuous Export for Data Archival and Compliance
Azure Data Explorer’s continuous export feature automatically exports data to external storage, enabling cost-effective long-term archival and compliance scenarios. Let’s explore how to implement this for your monitoring data.
Understanding Continuous Export
Continuous export:
- Runs periodically on newly ingested data
- Exports to Azure Blob Storage or Azure Data Lake
- Supports various formats (Parquet, CSV, JSON)
- Enables cost-effective cold storage
Setting Up External Storage
Create Storage Account
# Create storage account for exports
az storage account create \
--name adxarchive \
--resource-group adx-rg \
--location eastus \
--sku Standard_LRS \
--kind StorageV2 \
--enable-hierarchical-namespace true
# Create container for exports
az storage container create \
--name monitoring-archive \
--account-name adxarchive
Grant ADX Access
# Get ADX cluster principal ID
ADX_PRINCIPAL=$(az kusto cluster show \
--name myadxcluster \
--resource-group adx-rg \
--query identity.principalId -o tsv)
# Grant Storage Blob Data Contributor role
az role assignment create \
--role "Storage Blob Data Contributor" \
--assignee $ADX_PRINCIPAL \
--scope /subscriptions/{sub}/resourceGroups/adx-rg/providers/Microsoft.Storage/storageAccounts/adxarchive
Creating External Tables
// Create external table pointing to storage
.create external table ArchivedContainerLogs (
TimeGenerated: datetime,
Computer: string,
ContainerID: string,
LogEntry: string,
LogEntrySource: string,
Namespace: string,
PodName: string
)
kind=storage
partition by (Year: datetime = bin(TimeGenerated, 365d))
pathformat = ("year=" Year)
dataformat=parquet
(
h@'https://adxarchive.blob.core.windows.net/monitoring-archive;impersonate'
)
Configuring Continuous Export
Basic Export Configuration
// Create continuous export job
.create-or-alter continuous-export ContainerLogsExport
over (ContainerLogs)
to table ArchivedContainerLogs
with (
intervalBetweenRuns=1h,
forcedLatency=10m,
sizeLimit=104857600 // 100MB per file
)
<|
ContainerLogs
| project TimeGenerated, Computer, ContainerID, LogEntry, LogEntrySource, Namespace, PodName
Export with Aggregation
// Export aggregated data for efficient storage
.create-or-alter continuous-export HourlyMetricsExport
over (PerfMetrics)
to table ArchivedMetrics
with (
intervalBetweenRuns=1h,
forcedLatency=5m
)
<|
PerfMetrics
| summarize
AvgValue = avg(CounterValue),
MinValue = min(CounterValue),
MaxValue = max(CounterValue),
SampleCount = count()
by bin(TimeGenerated, 1h), Computer, ObjectName, CounterName, InstanceName
Managing Exports
Check Export Status
// View all continuous exports
.show continuous-exports
// Show specific export details
.show continuous-export ContainerLogsExport
// View export failures
.show continuous-export ContainerLogsExport failures
Monitor Export Progress
// Check export cursor
.show continuous-export ContainerLogsExport exported-artifacts
| top 10 by Timestamp desc
Pause and Resume
// Disable export
.disable continuous-export ContainerLogsExport
// Enable export
.enable continuous-export ContainerLogsExport
Querying Archived Data
Direct External Table Query
// Query archived data
external_table('ArchivedContainerLogs')
| where TimeGenerated between (datetime(2021-01-01) .. datetime(2021-06-30))
| where LogEntry contains "error"
| summarize count() by bin(TimeGenerated, 1d), Namespace
Union Hot and Cold Data
// Combine current and archived data
let hotData = ContainerLogs | where TimeGenerated > ago(30d);
let coldData = external_table('ArchivedContainerLogs') | where TimeGenerated <= ago(30d);
union hotData, coldData
| where LogEntry contains "critical"
| summarize count() by bin(TimeGenerated, 1d)
| render timechart
Compliance Scenarios
Retention Automation
// Policy: Delete hot data older than 30 days
.alter table ContainerLogs policy retention ```
{
"SoftDeletePeriod": "30.00:00:00",
"Recoverability": "Disabled"
}```
// Archived data lives in storage with its own retention policy
Audit Trail Export
// Export audit-relevant events
.create-or-alter continuous-export AuditExport
over (ContainerLogs)
to table ArchivedAuditLogs
with (intervalBetweenRuns=15m)
<|
ContainerLogs
| where LogEntry contains "login" or LogEntry contains "access" or LogEntry contains "permission"
| project TimeGenerated, Computer, ContainerID, LogEntry, Namespace, PodName
Cost Optimization
Storage Tiering
# Set lifecycle policy for automatic tiering
az storage account management-policy create \
--account-name adxarchive \
--resource-group adx-rg \
--policy '{
"rules": [
{
"name": "archiveOldData",
"type": "Lifecycle",
"definition": {
"filters": {
"prefixMatch": ["monitoring-archive/"],
"blobTypes": ["blockBlob"]
},
"actions": {
"baseBlob": {
"tierToCool": {"daysAfterModificationGreaterThan": 30},
"tierToArchive": {"daysAfterModificationGreaterThan": 90},
"delete": {"daysAfterModificationGreaterThan": 365}
}
}
}
}
]
}'
Parquet Format Benefits
// Parquet provides excellent compression
// Configure export to use Parquet
.create-or-alter continuous-export CompressedExport
over (ContainerLogs)
to table ArchivedContainerLogs
with (
intervalBetweenRuns=1h,
sizeLimit=524288000 // 500MB per file for better compression
)
<|
ContainerLogs
Terraform Configuration
resource "azurerm_storage_account" "archive" {
name = "adxarchive"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
account_tier = "Standard"
account_replication_type = "LRS"
is_hns_enabled = true
blob_properties {
delete_retention_policy {
days = 365
}
}
}
resource "azurerm_storage_container" "monitoring" {
name = "monitoring-archive"
storage_account_name = azurerm_storage_account.archive.name
container_access_type = "private"
}
resource "azurerm_role_assignment" "adx_storage" {
scope = azurerm_storage_account.archive.id
role_definition_name = "Storage Blob Data Contributor"
principal_id = azurerm_kusto_cluster.adx.identity[0].principal_id
}
Best Practices
- Choose appropriate intervals - Balance freshness vs. efficiency
- Use Parquet format - Best compression and query performance
- Partition by time - Enables efficient range queries
- Set size limits - Prevent too many small files
- Monitor export health - Alert on failures
- Implement storage lifecycle - Automate tiering and deletion
Conclusion
Continuous export enables cost-effective long-term data retention while maintaining query capability. By combining hot ADX storage with cold blob storage, you can meet compliance requirements without breaking the budget.
Tomorrow, we’ll explore Kusto functions for reusable query patterns and automation.