6 min read
Fabric Copilot Updates: AI-Powered Data Analytics
Fabric Copilot has received significant updates. Today I’m exploring how AI assistants are transforming data work in Microsoft Fabric.
Copilot Capabilities in Fabric
Fabric Copilot:
├── Notebooks (code generation)
├── Data Warehouse (SQL generation)
├── Power BI (report building)
├── Data Factory (pipeline creation)
└── Real-Time Intelligence (KQL queries)
Copilot for Notebooks
Code Generation
# User prompt: "Load sales data from the lakehouse and show top 10 products by revenue"
# Copilot generates:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, sum as spark_sum
# Initialize Spark session
spark = SparkSession.builder.getOrCreate()
# Load sales data from Lakehouse
sales_df = spark.read.format("delta").load("Tables/sales")
# Calculate revenue by product and get top 10
top_products = (
sales_df
.groupBy("product_id", "product_name")
.agg(spark_sum(col("quantity") * col("unit_price")).alias("total_revenue"))
.orderBy(col("total_revenue").desc())
.limit(10)
)
# Display results
display(top_products)
Code Explanation
# User prompt: "Explain this code"
# Original code:
result = df.groupBy("category").agg(
F.count("*").alias("count"),
F.avg("price").alias("avg_price"),
F.percentile_approx("price", 0.5).alias("median_price")
).filter(F.col("count") > 100)
# Copilot explains:
"""
This code performs the following operations:
1. Groups the DataFrame by the 'category' column
2. Calculates aggregations for each group:
- count: Total number of rows
- avg_price: Average of the 'price' column
- median_price: Approximate median (50th percentile) of prices
3. Filters to only include categories with more than 100 items
The result is a summary statistics table for each product category
that has significant volume (>100 items).
"""
Code Fixing
# User prompt: "Fix this error: Column 'price' does not exist"
# Original (buggy) code:
df.select("product_name", "price", "quantity")
# Copilot suggests:
# The column is named 'unit_price', not 'price'. Here's the fix:
df.select("product_name", "unit_price", "quantity")
# Or if you need to rename it:
df.select(
"product_name",
F.col("unit_price").alias("price"),
"quantity"
)
Copilot for Data Warehouse
Natural Language to SQL
-- User prompt: "Show me monthly sales trends for 2024 compared to 2023"
-- Copilot generates:
WITH monthly_sales AS (
SELECT
YEAR(order_date) as year,
MONTH(order_date) as month,
SUM(total_amount) as total_sales,
COUNT(DISTINCT order_id) as order_count
FROM dbo.orders
WHERE YEAR(order_date) IN (2023, 2024)
GROUP BY YEAR(order_date), MONTH(order_date)
)
SELECT
m2024.month,
m2024.total_sales as sales_2024,
m2023.total_sales as sales_2023,
ROUND((m2024.total_sales - m2023.total_sales) / m2023.total_sales * 100, 2) as yoy_growth_pct
FROM monthly_sales m2024
LEFT JOIN monthly_sales m2023
ON m2024.month = m2023.month
AND m2023.year = 2023
WHERE m2024.year = 2024
ORDER BY m2024.month;
Query Optimization
-- User prompt: "Optimize this query for better performance"
-- Original query:
SELECT * FROM orders o
WHERE o.customer_id IN (
SELECT customer_id FROM customers WHERE region = 'West'
)
AND o.order_date > '2024-01-01';
-- Copilot suggests:
-- Use JOIN instead of IN for better performance with large datasets:
SELECT o.*
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE c.region = 'West'
AND o.order_date > '2024-01-01';
-- Additional recommendations:
-- 1. Add index on customers.region if not exists
-- 2. Add index on orders.order_date if not exists
-- 3. Consider partitioning orders table by order_date
Copilot for Power BI
Report Creation
User prompt: "Create a sales dashboard with:
- KPI cards for total revenue, orders, and average order value
- Line chart showing monthly trends
- Bar chart of top 10 products
- Map showing sales by region"
Copilot:
1. Creates page layout with optimal tile placement
2. Adds KPI cards with appropriate measures
3. Creates line chart with proper date hierarchy
4. Builds bar chart with top N filter
5. Adds map visual with geographic data mapping
DAX Generation
// User prompt: "Create a measure for year-over-year growth percentage"
// Copilot generates:
YoY Growth % =
VAR CurrentYearSales = [Total Sales]
VAR PriorYearSales =
CALCULATE(
[Total Sales],
SAMEPERIODLASTYEAR('Date'[Date])
)
RETURN
IF(
NOT ISBLANK(PriorYearSales) && PriorYearSales <> 0,
DIVIDE(CurrentYearSales - PriorYearSales, PriorYearSales),
BLANK()
)
Insight Suggestions
Copilot analysis of sales data:
Key Insights Detected:
1. Revenue increased 15% in Q2 compared to Q1
2. Product category "Electronics" drives 45% of total revenue
3. Weekend orders are 30% higher than weekday orders
4. Customer retention rate dropped 5% in the last month
5. Top 3 regions account for 70% of all orders
Suggested Visualizations:
- Decomposition tree to analyze revenue drivers
- Cohort analysis for customer retention
- Anomaly detection on daily sales
Copilot for Data Factory
Pipeline Generation
User prompt: "Create a pipeline that:
1. Copies data from Azure SQL to Lakehouse daily
2. Transforms the data using a notebook
3. Updates the warehouse
4. Sends email on completion"
Copilot generates pipeline with:
- Copy activity with Azure SQL source
- Notebook activity with dependency
- Stored procedure activity for warehouse
- Web activity for email notification
- Schedule trigger for daily execution
Data Flow Suggestions
# User describes transformation needs
# Copilot generates Dataflow Gen2 code:
# Source: SQL Server customers table
source = spark.read.format("sqlserver") \
.option("host", "server.database.windows.net") \
.option("database", "sales_db") \
.option("dbtable", "customers") \
.load()
# Transform: Clean and standardize
transformed = source \
.withColumn("full_name", F.concat_ws(" ", "first_name", "last_name")) \
.withColumn("email_domain", F.split("email", "@").getItem(1)) \
.withColumn("created_date", F.to_date("created_at")) \
.filter(F.col("is_active") == True) \
.select("customer_id", "full_name", "email", "email_domain",
"region", "created_date")
# Destination: Lakehouse delta table
transformed.write.format("delta") \
.mode("merge") \
.option("mergeSchema", "true") \
.save("Tables/dim_customers")
Copilot for KQL
Query Generation
// User prompt: "Find devices with anomalous temperature readings in the last hour"
// Copilot generates:
SensorData
| where EventTime > ago(1h)
| where SensorType == "temperature"
| summarize
avg_temp = avg(Value),
stdev_temp = stdev(Value),
readings = count()
by DeviceId
| join kind=inner (
SensorData
| where EventTime > ago(1h)
| where SensorType == "temperature"
) on DeviceId
| where abs(Value - avg_temp) > 3 * stdev_temp
| project
EventTime,
DeviceId,
Value,
avg_temp,
deviation = abs(Value - avg_temp) / stdev_temp
| order by deviation desc
Best Practices for Using Copilot
Effective Prompting
Good prompts:
- "Create a PySpark function that calculates rolling 7-day average sales by product"
- "Write a SQL query to find customers who ordered in January but not February"
- "Generate DAX for month-over-month growth with null handling"
Less effective prompts:
- "Make a chart" (too vague)
- "Fix it" (missing context)
- "Better code" (no specific goal)
Review and Validate
# Always review Copilot output
# Copilot suggestion (may need validation):
result = df.filter(F.col("date") > "2024-01-01")
# Validated and improved:
from datetime import date
# Use proper date type for type safety
result = df.filter(F.col("date") > F.lit(date(2024, 1, 1)))
# Add data quality check
assert result.count() > 0, "No data after filter - check date range"
Copilot Availability
| Feature | SKU Required | Status |
|---|---|---|
| Notebooks | F64+ | GA |
| Data Warehouse | F64+ | GA |
| Power BI | PPU/F64+ | GA |
| Data Factory | F64+ | Preview |
| Real-Time Intel | F64+ | Preview |
What’s Next
Tomorrow I’ll cover Copilot for Notebooks in more depth.