April 2, 2024 1 min read

Fabric Copilot Improvements: Smarter AI Assistance

Microsoft Fabric Copilot AI Data Analytics

Fabric Copilot Improvements: Smarter AI Assistance

Microsoft Fabric Copilot has received significant upgrades, making AI-assisted data analysis more powerful and contextually aware. Here’s what’s new and how to leverage these improvements.

New Copilot Capabilities

# Copilot now supports more complex queries and multi-turn conversations

COPILOT_CAPABILITIES_APRIL_2024 = {
    "natural_language_to_sql": {
        "improvements": [
            "Better understanding of business terminology",
            "Support for complex joins",
            "Window function generation",
            "Subquery optimization"
        ]
    },
    "data_analysis": {
        "improvements": [
            "Automatic insight generation",
            "Anomaly detection suggestions",
            "Trend analysis",
            "Correlation discovery"
        ]
    },
    "code_generation": {
        "improvements": [
            "PySpark code from descriptions",
            "DAX measure suggestions",
            "M query generation",
            "Python visualization code"
        ]
    }
}

Using Copilot for SQL Generation

# Example: How Copilot translates natural language to SQL

# User prompt: "Show me total sales by region for the last quarter,
#               compared to the same period last year"

# Copilot generates:
generated_sql = """
WITH current_quarter AS (
    SELECT
        region,
        SUM(amount) as total_sales
    FROM sales
    WHERE order_date >= DATEADD(quarter, -1, GETDATE())
      AND order_date < GETDATE()
    GROUP BY region
),
previous_year AS (
    SELECT
        region,
        SUM(amount) as total_sales
    FROM sales
    WHERE order_date >= DATEADD(year, -1, DATEADD(quarter, -1, GETDATE()))
      AND order_date < DATEADD(year, -1, GETDATE())
    GROUP BY region
)
SELECT
    COALESCE(c.region, p.region) as region,
    c.total_sales as current_quarter_sales,
    p.total_sales as previous_year_sales,
    ROUND((c.total_sales - p.total_sales) / p.total_sales * 100, 2) as yoy_change_pct
FROM current_quarter c
FULL OUTER JOIN previous_year p ON c.region = p.region
ORDER BY current_quarter_sales DESC
"""

Copilot in Notebooks

# Copilot assists with PySpark code generation

# User prompt: "Load the sales data, clean nulls, and aggregate by product category"

# Copilot generates:
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, sum, avg, count, when, coalesce

spark = SparkSession.builder.getOrCreate()

# Load data from lakehouse
df = spark.read.format("delta").load("Tables/sales_data")

# Clean null values
df_cleaned = df.na.fill({
    "quantity": 0,
    "price": 0.0,
    "discount": 0.0,
    "product_category": "Unknown"
})

# Handle null amounts by calculating from quantity * price
df_cleaned = df_cleaned.withColumn(
    "amount",
    coalesce(
        col("amount"),
        col("quantity") * col("price") * (1 - col("discount"))
    )
)

# Aggregate by product category
df_aggregated = df_cleaned.groupBy("product_category").agg(
    count("*").alias("transaction_count"),
    sum("quantity").alias("total_quantity"),
    sum("amount").alias("total_revenue"),
    avg("amount").alias("avg_transaction_value")
)

# Display results
display(df_aggregated.orderBy(col("total_revenue").desc()))

Contextual Awareness

# Copilot now understands your data model and business context

# With the new context-aware features, Copilot can:

class CopilotContext:
    """Example of how Copilot maintains context"""

    def __init__(self, workspace_id: str):
        self.workspace_id = workspace_id
        self.conversation_history = []
        self.data_model_context = {}

    def add_data_model_context(self, model_metadata: dict):
        """Add semantic model context for better suggestions"""
        self.data_model_context = {
            "tables": model_metadata.get("tables", []),
            "relationships": model_metadata.get("relationships", []),
            "measures": model_metadata.get("measures", []),
            "business_glossary": model_metadata.get("glossary", {})
        }

    def process_query(self, natural_language: str) -> str:
        """Process query with full context"""
        # Copilot uses:
        # 1. Data model metadata (columns, types, relationships)
        # 2. Business glossary (what terms mean)
        # 3. Previous conversation turns
        # 4. Query patterns from similar workspaces

        self.conversation_history.append({
            "role": "user",
            "content": natural_language
        })

        # Example: "revenue" is understood as SUM(Sales[Amount])
        # because it's defined in the business glossary
        return self._generate_response()

# Example business glossary
business_glossary = {
    "revenue": "SUM(Sales[Amount])",
    "profit": "SUM(Sales[Amount]) - SUM(Sales[Cost])",
    "active customers": "DISTINCTCOUNT(Sales[CustomerID]) where last_order > today - 90 days",
    "churn rate": "Customers lost / Total customers at period start"
}

DAX Generation Improvements

# Copilot generates more sophisticated DAX measures

# User prompt: "Create a measure for running total of sales that resets each year"

# Copilot generates:
dax_measure = """
Running Total YTD =
VAR CurrentDate = MAX('Calendar'[Date])
VAR YearStart = DATE(YEAR(CurrentDate), 1, 1)
RETURN
    CALCULATE(
        SUM(Sales[Amount]),
        FILTER(
            ALL('Calendar'[Date]),
            'Calendar'[Date] >= YearStart &&
            'Calendar'[Date] <= CurrentDate
        )
    )
"""

# User prompt: "Show me sales with a moving average and highlight when sales are
#               20% above or below the average"

# Copilot generates:
moving_average_dax = """
Sales vs Moving Avg =
VAR CurrentSales = SUM(Sales[Amount])
VAR MovingAvg =
    AVERAGEX(
        DATESINPERIOD(
            'Calendar'[Date],
            MAX('Calendar'[Date]),
            -30,
            DAY
        ),
        CALCULATE(SUM(Sales[Amount]))
    )
VAR Variance = DIVIDE(CurrentSales - MovingAvg, MovingAvg)
RETURN
    SWITCH(
        TRUE(),
        Variance > 0.2, "Above Average",
        Variance < -0.2, "Below Average",
        "Normal"
    )
"""

Best Practices for Copilot

COPILOT_BEST_PRACTICES = {
    "be_specific": [
        "Include date ranges explicitly",
        "Name the exact columns you need",
        "Specify aggregation types"
    ],
    "provide_context": [
        "Reference table names",
        "Explain business terms",
        "Describe expected output"
    ],
    "iterate": [
        "Start simple, then add complexity",
        "Use follow-up questions",
        "Ask for optimizations"
    ],
    "validate": [
        "Review generated code before running",
        "Test on sample data first",
        "Check for edge cases"
    ]
}

Conclusion

Fabric Copilot’s improvements make it a powerful assistant for data professionals. The enhanced context awareness, better SQL/DAX generation, and multi-turn conversations significantly improve productivity in data analysis workflows.