6 min read
Azure Arc-Enabled Data Services: Run Azure Data Anywhere
Azure Arc-enabled data services bring Azure SQL and PostgreSQL to any Kubernetes environment. At Ignite 2021, Microsoft announced general availability and new features that make running cloud-native databases anywhere a reality.
What Are Arc-Enabled Data Services?
Arc-enabled data services provide:
- Azure SQL Managed Instance: Full SQL Server compatibility
- Azure PostgreSQL Hyperscale: Distributed PostgreSQL with Citus
- Evergreen updates: Always current with Azure SQL updates
- Flexible deployment: Direct or indirect connectivity modes
- Azure integration: Billing, monitoring, and security from Azure
Architecture Overview
┌─────────────────────────────────────────┐
│ Azure │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ Azure Portal │ │ Azure Monitor │ │
│ └──────────────┘ └──────────────────┘ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ ARM/Billing │ │ Azure AD │ │
│ └──────────────┘ └──────────────────┘ │
└───────────────────▲──────────────────────┘
│
┌───────────────────┼──────────────────────┐
│ Kubernetes │ (On-Prem/Edge/Cloud)│
│ ┌────────────────┴─────────────────────┐│
│ │ Data Controller ││
│ │ (orchestrates data services) ││
│ └──────────────────────────────────────┘│
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ SQL MI #1 │ │ PostgreSQL #1 │ │
│ │ SQL MI #2 │ │ PostgreSQL #2 │ │
│ └──────────────┘ └──────────────────┘ │
└─────────────────────────────────────────┘
Deploying the Data Controller
Prerequisites
# Install Azure Data CLI extension
az extension add --name arcdata
# Verify Kubernetes cluster
kubectl get nodes
kubectl get storageclasses
Create Data Controller
# Create the data controller (direct connected mode)
az arcdata dc create \
--name arc-dc \
--resource-group rg-arc-data \
--location eastus \
--connectivity-mode direct \
--k8s-namespace arc-data \
--subscription your-subscription-id \
--use-k8s
# For indirect mode (air-gapped environments)
az arcdata dc create \
--name arc-dc \
--resource-group rg-arc-data \
--location eastus \
--connectivity-mode indirect \
--k8s-namespace arc-data \
--subscription your-subscription-id \
--use-k8s
Using a configuration profile:
# data-controller-config.yaml
apiVersion: arcdata.microsoft.com/v1
kind: DataController
metadata:
name: arc-dc
namespace: arc-data
spec:
credentials:
controllerAdmin: controller-login
dockerRegistry: arc-private-registry
serviceAccount: sa-arc-controller
docker:
imagePullPolicy: Always
imageTag: v1.2.0_2021-11-02
registry: mcr.microsoft.com
repository: arcdata
infrastructure: kubernetes
security:
allowDumps: true
allowNodeMetricsCollection: true
allowPodMetricsCollection: true
services:
- name: controller
port: 30080
serviceType: LoadBalancer
settings:
ElasticSearch:
vm.max_map_count: "-1"
azure:
connectionMode: direct
location: eastus
resourceGroup: rg-arc-data
subscription: your-subscription-id
storage:
data:
accessMode: ReadWriteOnce
className: managed-premium
size: 15Gi
logs:
accessMode: ReadWriteOnce
className: managed-premium
size: 10Gi
Deploying SQL Managed Instance
Create SQL MI
# Create SQL Managed Instance
az sql mi-arc create \
--name sql-mi-prod \
--resource-group rg-arc-data \
--location eastus \
--k8s-namespace arc-data \
--use-k8s \
--cores-limit 4 \
--cores-request 2 \
--memory-limit 8Gi \
--memory-request 4Gi \
--storage-class-data managed-premium \
--storage-class-logs managed-premium \
--volume-size-data 100Gi \
--volume-size-logs 20Gi \
--tier GeneralPurpose \
--license-type BasePrice
Using YAML:
# sql-mi.yaml
apiVersion: sql.arcdata.microsoft.com/v1
kind: SqlManagedInstance
metadata:
name: sql-mi-prod
namespace: arc-data
spec:
dev: false
licenseType: BasePrice
tier: GeneralPurpose
scheduling:
default:
resources:
limits:
cpu: "4"
memory: 8Gi
requests:
cpu: "2"
memory: 4Gi
security:
adminLoginSecret: sql-login-secret
services:
primary:
type: LoadBalancer
port: 1433
storage:
data:
volumes:
- className: managed-premium
size: 100Gi
logs:
volumes:
- className: managed-premium
size: 20Gi
backups:
volumes:
- className: azurefile
size: 50Gi
Connect and Use SQL MI
# Get connection endpoint
az sql mi-arc show \
--name sql-mi-prod \
--resource-group rg-arc-data \
--query "properties.primaryEndpoint"
# Connect using sqlcmd
sqlcmd -S 10.0.0.100,1433 -U sa -P 'YourPassword' -Q "SELECT @@VERSION"
Application connection string:
using Microsoft.Data.SqlClient;
var connectionString = "Server=10.0.0.100,1433;Database=MyDatabase;User Id=sa;Password=YourPassword;Encrypt=True;TrustServerCertificate=True";
using var connection = new SqlConnection(connectionString);
await connection.OpenAsync();
using var command = new SqlCommand("SELECT TOP 10 * FROM Customers", connection);
using var reader = await command.ExecuteReaderAsync();
while (await reader.ReadAsync())
{
Console.WriteLine($"{reader["Id"]}: {reader["Name"]}");
}
High Availability Configuration
Business Critical Tier with Availability Groups
apiVersion: sql.arcdata.microsoft.com/v1
kind: SqlManagedInstance
metadata:
name: sql-mi-ha
namespace: arc-data
spec:
tier: BusinessCritical
replicas: 3
scheduling:
default:
resources:
limits:
cpu: "8"
memory: 32Gi
requests:
cpu: "4"
memory: 16Gi
services:
primary:
type: LoadBalancer
readableSecondaries:
type: LoadBalancer
storage:
data:
volumes:
- className: managed-premium
size: 500Gi
Query read replicas:
-- Connect to readable secondary
-- Connection string: Server=10.0.0.101,1433;ApplicationIntent=ReadOnly
-- Check replica role
SELECT
replica_server_name,
role_desc,
synchronization_health_desc
FROM sys.dm_hadr_availability_replica_states;
PostgreSQL Hyperscale
Deploy PostgreSQL
# Create PostgreSQL Hyperscale server group
az postgres arc-server create \
--name pg-hyperscale \
--resource-group rg-arc-data \
--k8s-namespace arc-data \
--use-k8s \
--cores-limit 4 \
--cores-request 2 \
--memory-limit 8Gi \
--memory-request 4Gi \
--workers 2 \
--storage-class-data managed-premium \
--volume-size-data 100Gi
Scale Out with Workers
# Add worker nodes
az postgres arc-server edit \
--name pg-hyperscale \
--resource-group rg-arc-data \
--workers 4 \
--cores-limit 8 \
--use-k8s
Distributed Tables with Citus
-- Connect to coordinator
psql -h 10.0.0.200 -p 5432 -U postgres -d citus
-- Create distributed table
CREATE TABLE events (
event_id bigserial,
tenant_id int NOT NULL,
event_type text,
event_data jsonb,
created_at timestamptz DEFAULT now()
);
-- Distribute by tenant_id
SELECT create_distributed_table('events', 'tenant_id');
-- Create reference table (replicated to all nodes)
CREATE TABLE event_types (
type_id serial PRIMARY KEY,
type_name text,
description text
);
SELECT create_reference_table('event_types');
-- Query runs in parallel across workers
SELECT
tenant_id,
event_type,
COUNT(*) as event_count
FROM events
WHERE created_at > now() - interval '1 day'
GROUP BY tenant_id, event_type
ORDER BY event_count DESC
LIMIT 100;
Monitoring and Logs
View Metrics in Azure Portal
# Export metrics (indirect mode)
az arcdata dc export \
--type metrics \
--path metrics.json \
--k8s-namespace arc-data \
--use-k8s
az arcdata dc upload \
--path metrics.json
Grafana Dashboards
# Get Grafana endpoint
kubectl get svc -n arc-data | grep grafana
# Port forward for local access
kubectl port-forward svc/metricsui-external-svc -n arc-data 3000:3000
Log Analytics Integration
# Configure Log Analytics workspace
az arcdata dc update \
--name arc-dc \
--resource-group rg-arc-data \
--auto-upload-logs true \
--auto-upload-metrics true
KQL queries:
// SQL MI performance
SqlManagedInstance_sqlmi_logs_CL
| where TimeGenerated > ago(1h)
| where instance_name_s == "sql-mi-prod"
| project TimeGenerated, cpu_percent_d, memory_percent_d, io_percent_d
// PostgreSQL queries
PostgreSQL_pg_stat_statements_CL
| where TimeGenerated > ago(24h)
| summarize
TotalCalls = sum(calls_d),
AvgTime = avg(mean_time_d)
by query_s
| order by TotalCalls desc
| take 20
Backup and Restore
Point-in-Time Restore
# List available restore points
az sql mi-arc list-restore-points \
--name sql-mi-prod \
--k8s-namespace arc-data \
--use-k8s
# Restore to point in time
az sql mi-arc restore \
--source-database AdventureWorks \
--dest-name sql-mi-restore \
--time "2021-11-18T10:30:00Z" \
--k8s-namespace arc-data \
--use-k8s
Automated Backups
# Backup configuration
apiVersion: sql.arcdata.microsoft.com/v1
kind: SqlManagedInstance
metadata:
name: sql-mi-prod
spec:
backup:
retentionPeriodInDays: 14
fullBackupDays: 7
differentialBackupHours: 12
transactionLogBackupMinutes: 5
storage:
backups:
volumes:
- className: azurefile
size: 100Gi
Arc-enabled data services bring cloud-native database capabilities to any environment. Whether you need SQL Server or PostgreSQL, you can run managed database services on your own infrastructure while maintaining Azure integration.