Back to Blog
3 min read

Power BI Automatic Aggregations: AI-Powered Query Optimization

Automatic aggregations in Power BI use machine learning to create and manage aggregation tables automatically. This eliminates the manual work of creating user-defined aggregations while optimizing query performance.

How Automatic Aggregations Work

Power BI analyzes query patterns and automatically:

  1. Identifies frequently used groupings and measures
  2. Creates optimized aggregation tables
  3. Routes queries to aggregations when possible
  4. Maintains aggregations during refresh

Enabling Automatic Aggregations

In Power BI Desktop or Service (Premium):

{
  "table": "FactSales",
  "storageMode": "DirectQuery",
  "automaticAggregations": {
    "enabled": true,
    "training": {
      "trainingWindow": "30 days",
      "sampleQueries": true
    },
    "aggregationTables": {
      "maxCount": 10,
      "maxRows": 10000000
    }
  }
}

Configuration via XMLA

<Alter xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
  <Object>
    <DatabaseID>SalesAnalytics</DatabaseID>
  </Object>
  <ObjectDefinition>
    <Database>
      <AutomaticAggregation>
        <Enabled>true</Enabled>
        <TrainingDays>30</TrainingDays>
        <MaxTableCount>10</MaxTableCount>
        <MaxTableRows>10000000</MaxTableRows>
      </AutomaticAggregation>
    </Database>
  </ObjectDefinition>
</Alter>

Monitoring Aggregation Usage

// Check aggregation cache hits
EVALUATE
SUMMARIZE(
    INFO.AUTOMANAGGREGATIONS(),
    [TableName],
    [AggTableName],
    [QueryCount],
    [HitCount],
    [HitRatio]
)
ORDER BY [QueryCount] DESC

Query Performance Analysis

// Log Analytics query for aggregation effectiveness
PowerBIDatasetQuery
| where DatasetId == "dataset-guid"
| extend
    UsedAggregation = tostring(parse_json(Properties).["AggregationUsed"]),
    QueryDuration = DurationMs
| summarize
    TotalQueries = count(),
    AggregationHits = countif(UsedAggregation == "true"),
    AvgDurationWithAgg = avgif(QueryDuration, UsedAggregation == "true"),
    AvgDurationWithoutAgg = avgif(QueryDuration, UsedAggregation != "true")
    by bin(TimeGenerated, 1h)
| extend HitRatio = todouble(AggregationHits) / TotalQueries * 100

Training Period Considerations

class AggregationTrainer:
    def __init__(self, dataset_connection):
        self.connection = dataset_connection

    def analyze_query_patterns(self):
        """Analyze query patterns to understand aggregation opportunities."""
        query = """
        SELECT
            [Query],
            [StartTime],
            [EndTime],
            [Duration],
            [User]
        FROM $SYSTEM.DISCOVER_QUERIES
        WHERE [Duration] > 1000  -- Long-running queries
        """
        return self.connection.execute(query)

    def identify_aggregation_candidates(self, queries):
        """Identify columns and measures for aggregation."""
        candidates = {}

        for query in queries:
            # Parse query to extract grouping columns
            columns = self._extract_group_by_columns(query)
            measures = self._extract_measures(query)

            key = frozenset(columns)
            if key not in candidates:
                candidates[key] = {
                    "columns": columns,
                    "measures": measures,
                    "frequency": 0,
                    "total_duration": 0
                }

            candidates[key]["frequency"] += 1
            candidates[key]["total_duration"] += query["Duration"]

        # Rank by impact
        ranked = sorted(
            candidates.values(),
            key=lambda x: x["frequency"] * x["total_duration"],
            reverse=True
        )

        return ranked[:10]  # Top 10 candidates

Manual Aggregation Hints

While automatic aggregations are managed by AI, you can provide hints:

// Create a calculated table that hints at useful aggregations
EVALUATE
SUMMARIZECOLUMNS(
    'Date'[Year],
    'Date'[Quarter],
    'Product'[Category],
    'Geography'[Region],
    "TotalSales", SUM('Sales'[Amount]),
    "TotalQuantity", SUM('Sales'[Quantity]),
    "DistinctCustomers", DISTINCTCOUNT('Sales'[CustomerID])
)

Comparing with User-Defined Aggregations

FeatureAutomaticUser-Defined
SetupAutomaticManual
OptimizationAI-drivenHuman expertise
MaintenanceSelf-maintainingRequires updates
FlexibilityQuery-basedCustom logic
TransparencyLess visibleFully documented

Best Practices

// Premium capacity configuration for optimal aggregations
resource premiumCapacity 'Microsoft.PowerBIDedicated/capacities@2021-01-01' = {
  name: 'premium-capacity'
  location: location
  sku: {
    name: 'P1'
  }
  properties: {
    administration: {
      members: ['admin@company.com']
    }
  }
}

Workload settings:

{
  "workloadSettings": {
    "dataflows": {
      "maxMemoryPercentage": 20
    },
    "paginated": {
      "maxMemoryPercentage": 10
    },
    "automaticAggregations": {
      "maxMemoryPercentage": 25,
      "maxCacheSize": "50GB"
    }
  }
}

Troubleshooting

// Check why aggregation wasn't used
EVALUATE
ROW(
    "DetailRowCount", COUNTROWS(FactSales),
    "AggregationAvailable", INFO.AUTOMANAGGREGATIONS.EXISTS(),
    "QueryComplexity", "Check if query is too complex for aggregation"
)

Automatic aggregations bring AI-powered optimization to Power BI, making it easier than ever to achieve excellent query performance on large datasets.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.