Skip to content
Back to Blog
2 min read

Azure Managed Grafana: Visualization for Cloud Observability

Grafana on Azure became substantially easier when Azure Managed Grafana reached general availability. Before that, teams ran Grafana on AKS or a VM, managed their own TLS, authentication, and version upgrades, and inevitably accumulated a Grafana instance that nobody updated for two years. Managed Grafana handles the hosting, authentication (Azure AD SSO built in), plugin management, and HA. The native Azure Monitor data source integration means you point it at your subscription and the metrics are browsable immediately. I pair it with Azure Monitor Managed Service for Prometheus for Kubernetes observability—Prometheus scraping, Azure data storage for the metrics, and Grafana as the query and dashboard layer.

Why Managed Grafana?

Self-hosting Grafana requires:

  • Infrastructure management
  • Authentication configuration
  • Plugin management
  • High availability setup
  • Security patching

Azure Managed Grafana handles all of this while integrating natively with Azure AD and Azure Monitor.

Creating a Managed Grafana Instance

# Create Managed Grafana instance
az grafana create \
    --name my-grafana \
    --resource-group my-rg \
    --location eastus \
    --sku-tier Standard

# Get Grafana endpoint
az grafana show \
    --name my-grafana \
    --resource-group my-rg \
    --query properties.endpoint

# Assign yourself Grafana Admin role
GRAFANA_ID=$(az grafana show --name my-grafana --resource-group my-rg --query id -o tsv)
USER_ID=$(az ad signed-in-user show --query id -o tsv)

az role assignment create \
    --assignee $USER_ID \
    --role "Grafana Admin" \
    --scope $GRAFANA_ID

Configuring Data Sources

Azure Monitor

Azure Monitor is pre-configured, but you can customize it:

# Grant Grafana access to subscription metrics
GRAFANA_IDENTITY=$(az grafana show \
    --name my-grafana \
    --resource-group my-rg \
    --query identity.principalId -o tsv)

az role assignment create \
    --assignee $GRAFANA_IDENTITY \
    --role "Monitoring Reader" \
    --scope /subscriptions/<subscription-id>

Azure Data Explorer

{
  "name": "Azure Data Explorer",
  "type": "grafana-azure-data-explorer-datasource",
  "access": "proxy",
  "jsonData": {
    "clusterUrl": "https://mycluster.eastus.kusto.windows.net",
    "tenantId": "<tenant-id>",
    "clientId": "<client-id>",
    "defaultDatabase": "mydb"
  },
  "secureJsonData": {
    "clientSecret": "<client-secret>"
  }
}

Prometheus

{
  "name": "Prometheus",
  "type": "prometheus",
  "access": "proxy",
  "url": "http://prometheus-server.monitoring.svc.cluster.local:80",
  "jsonData": {
    "httpMethod": "POST",
    "timeInterval": "15s"
  }
}

Creating Dashboards via API

Use Terraform or API to create dashboards as code:

import requests
import json

grafana_url = "https://my-grafana-xxxx.grafana.azure.com"
api_key = "your-api-key"  # Create via Grafana UI

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

dashboard = {
    "dashboard": {
        "id": None,
        "uid": "azure-vms-overview",
        "title": "Azure VMs Overview",
        "tags": ["azure", "infrastructure"],
        "timezone": "browser",
        "schemaVersion": 30,
        "version": 0,
        "refresh": "30s",
        "panels": [
            {
                "id": 1,
                "title": "CPU Percentage by VM",
                "type": "timeseries",
                "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
                "datasource": "Azure Monitor",
                "targets": [
                    {
                        "refId": "A",
                        "queryType": "Azure Monitor",
                        "azureMonitor": {
                            "resourceGroup": "$resourceGroup",
                            "resourceName": "$vmName",
                            "metricNamespace": "Microsoft.Compute/virtualMachines",
                            "metricName": "Percentage CPU",
                            "aggregation": "Average",
                            "timeGrain": "auto"
                        }
                    }
                ]
            },
            {
                "id": 2,
                "title": "Available Memory",
                "type": "gauge",
                "gridPos": {"h": 8, "w": 6, "x": 12, "y": 0},
                "datasource": "Azure Monitor",
                "options": {
                    "showThresholdLabels": False,
                    "showThresholdMarkers": True
                },
                "fieldConfig": {
                    "defaults": {
                        "thresholds": {
                            "mode": "percentage",
                            "steps": [
                                {"color": "red", "value": None},
                                {"color": "orange", "value": 20},
                                {"color": "green", "value": 40}
                            ]
                        },
                        "unit": "bytes",
                        "min": 0
                    }
                },
                "targets": [
                    {
                        "refId": "A",
                        "azureMonitor": {
                            "resourceGroup": "$resourceGroup",
                            "resourceName": "$vmName",
                            "metricNamespace": "Microsoft.Compute/virtualMachines",
                            "metricName": "Available Memory Bytes",
                            "aggregation": "Average"
                        }
                    }
                ]
            }
        ],
        "templating": {
            "list": [
                {
                    "name": "resourceGroup",
                    "type": "query",
                    "datasource": "Azure Monitor",
                    "query": "ResourceGroups()",
                    "refresh": 1
                },
                {
                    "name": "vmName",
                    "type": "query",
                    "datasource": "Azure Monitor",
                    "query": "ResourceNames($resourceGroup, Microsoft.Compute/virtualMachines)",
                    "refresh": 1
                }
            ]
        }
    },
    "folderId": 0,
    "overwrite": True
}

response = requests.post(
    f"{grafana_url}/api/dashboards/db",
    headers=headers,
    json=dashboard
)

print(response.json())

Log Analytics Queries

Query Azure Log Analytics from Grafana:

// Panel query for Application Insights requests
requests
| where timestamp > ago(1h)
| summarize count() by bin(timestamp, 5m), resultCode
| order by timestamp asc
// Error rate panel
requests
| where timestamp > ago(1h)
| summarize
    total = count(),
    errors = countif(success == false)
    by bin(timestamp, 5m)
| extend error_rate = errors * 100.0 / total
| project timestamp, error_rate

Alerting

Configure alerts in Grafana:

alert_rule = {
    "name": "High CPU Alert",
    "interval": "1m",
    "rules": [
        {
            "grafana_alert": {
                "condition": "B",
                "data": [
                    {
                        "refId": "A",
                        "datasourceUid": "azure-monitor-uid",
                        "model": {
                            "azureMonitor": {
                                "aggregation": "Average",
                                "metricName": "Percentage CPU",
                                "metricNamespace": "Microsoft.Compute/virtualMachines",
                                "resourceGroup": "production",
                                "resourceName": "web-server"
                            }
                        }
                    },
                    {
                        "refId": "B",
                        "datasourceUid": "__expr__",
                        "model": {
                            "conditions": [
                                {
                                    "evaluator": {
                                        "params": [80],
                                        "type": "gt"
                                    },
                                    "operator": {"type": "and"},
                                    "query": {"params": ["A"]},
                                    "reducer": {"params": [], "type": "avg"},
                                    "type": "query"
                                }
                            ],
                            "type": "classic_conditions"
                        }
                    }
                ],
                "exec_err_state": "Error",
                "no_data_state": "NoData",
                "title": "High CPU on Web Server"
            }
        }
    ]
}

Dashboard Best Practices

Use Variables for Flexibility

{
  "templating": {
    "list": [
      {
        "name": "subscription",
        "type": "query",
        "query": "Subscriptions()",
        "multi": false
      },
      {
        "name": "resourceGroup",
        "type": "query",
        "query": "ResourceGroups($subscription)",
        "multi": true
      },
      {
        "name": "timeRange",
        "type": "interval",
        "options": [
          {"text": "1h", "value": "1h"},
          {"text": "6h", "value": "6h"},
          {"text": "24h", "value": "24h"},
          {"text": "7d", "value": "7d"}
        ]
      }
    ]
  }
}

Organize with Rows

{
  "panels": [
    {
      "type": "row",
      "title": "CPU Metrics",
      "collapsed": false,
      "gridPos": {"h": 1, "w": 24, "x": 0, "y": 0}
    },
    {
      "id": 1,
      "title": "CPU Usage",
      "gridPos": {"h": 8, "w": 12, "x": 0, "y": 1}
    },
    {
      "type": "row",
      "title": "Memory Metrics",
      "collapsed": true,
      "gridPos": {"h": 1, "w": 24, "x": 0, "y": 9},
      "panels": [
        {
          "id": 2,
          "title": "Memory Usage",
          "gridPos": {"h": 8, "w": 12, "x": 0, "y": 10}
        }
      ]
    }
  ]
}

Sharing and Embedding

Public Snapshots

# Create a snapshot
snapshot = {
    "dashboard": dashboard_json,
    "expires": 86400  # 24 hours
}

response = requests.post(
    f"{grafana_url}/api/snapshots",
    headers=headers,
    json=snapshot
)

print(f"Snapshot URL: {response.json()['url']}")

Embedding Panels

<!-- Embed a single panel in external application -->
<iframe
  src="https://my-grafana.grafana.azure.com/d-solo/azure-vms-overview/azure-vms?orgId=1&panelId=1&from=now-1h&to=now"
  width="800"
  height="400"
  frameborder="0">
</iframe>

Terraform Configuration

resource "azurerm_dashboard_grafana" "main" {
  name                = "my-grafana"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location

  sku                                = "Standard"
  zone_redundancy_enabled           = true
  api_key_enabled                   = true
  deterministic_outbound_ip_enabled = true

  identity {
    type = "SystemAssigned"
  }

  azure_monitor_workspace_integrations {
    resource_id = azurerm_monitor_workspace.main.id
  }

  tags = {
    environment = "production"
  }
}

resource "azurerm_role_assignment" "grafana_monitoring" {
  scope                = data.azurerm_subscription.current.id
  role_definition_name = "Monitoring Reader"
  principal_id         = azurerm_dashboard_grafana.main.identity[0].principal_id
}

Resources

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.