November 19, 2024 1 min read

Microsoft Fabric at Ignite 2024: What's New for Data Professionals

Azure Microsoft Fabric Data Engineering Ignite 2024 Analytics

Microsoft Ignite 2024 brought significant updates to Microsoft Fabric. Let’s explore the key announcements and what they mean for data professionals.

Major Announcements

1. Fabric AI Skills

AI Skills enable natural language interaction with your data:

# AI Skills are configured through the Fabric portal and accessed via REST API
from azure.identity import DefaultAzureCredential
import requests

credential = DefaultAzureCredential()
token = credential.get_token("https://api.fabric.microsoft.com/.default").token

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

workspace_id = "your-workspace-id"
ai_skill_id = "your-ai-skill-id"

# Query an AI Skill via REST API
query_url = f"https://api.fabric.microsoft.com/v1/workspaces/{workspace_id}/aiskills/{ai_skill_id}/query"

query_payload = {
    "question": "What were our top products last quarter by region?"
}

response = requests.post(query_url, headers=headers, json=query_payload)
result = response.json()

print(result.get("answer"))
print(result.get("sqlQuery"))
print(result.get("data"))

# Note: AI Skills are created and configured in the Fabric portal UI,
# including data source connections, instructions, and allowed operations.

2. Analytics Agents

Pre-built agents for common analytics tasks:

# Analytics capabilities in Fabric using Semantic Link and PySpark
import sempy.fabric as fabric
from pyspark.sql import SparkSession
from pyspark.sql import functions as F

spark = SparkSession.builder.getOrCreate()

# Explore a dataset using PySpark
df = spark.read.format("delta").load("Tables/gold/sales_fact")

# Generate summary statistics
print("Dataset Summary:")
df.describe().show()

# Key metrics
metrics = df.agg(
    F.count("*").alias("row_count"),
    F.countDistinct("customer_id").alias("unique_customers"),
    F.sum("revenue").alias("total_revenue"),
    F.avg("revenue").alias("avg_revenue")
).collect()[0]

print(f"Total rows: {metrics['row_count']}")
print(f"Unique customers: {metrics['unique_customers']}")
print(f"Total revenue: ${metrics['total_revenue']:,.2f}")

# Simple anomaly detection using statistics
daily_revenue = spark.read.format("delta").load("Tables/gold/daily_revenue")

stats = daily_revenue.agg(
    F.avg("revenue").alias("mean"),
    F.stddev("revenue").alias("stddev")
).collect()[0]

mean_val, stddev_val = stats["mean"], stats["stddev"]
threshold = 2.5  # Z-score threshold

anomalies = daily_revenue.filter(
    F.abs(F.col("revenue") - mean_val) > (threshold * stddev_val)
).orderBy(F.desc("date"))

print("Detected anomalies:")
anomalies.show()

3. OneLake AI Workloads

Run AI workloads directly on OneLake data:

# Access OneLake data and create embeddings using Azure OpenAI
from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI
from pyspark.sql import SparkSession
import json

spark = SparkSession.builder.getOrCreate()

# Read documents from OneLake Files
docs_df = spark.read.text("Files/documents/*.txt")
documents = [row.value for row in docs_df.collect()]

# Create embeddings using Azure OpenAI
credential = DefaultAzureCredential()
token = credential.get_token("https://cognitiveservices.azure.com/.default").token

client = AzureOpenAI(
    azure_endpoint="https://your-resource.openai.azure.com",
    api_version="2024-02-01",
    azure_ad_token=token
)

embeddings = []
for i, doc in enumerate(documents):
    response = client.embeddings.create(
        model="text-embedding-3-large",
        input=doc
    )
    embeddings.append({
        "id": i,
        "text": doc,
        "embedding": response.data[0].embedding
    })

# Save embeddings to OneLake as JSON
embeddings_df = spark.createDataFrame(embeddings)
embeddings_df.write.mode("overwrite").json("Files/embeddings/")

print(f"Processed {len(embeddings)} documents")

# For vector search, use Azure AI Search or store in a vector-capable database
# integrated with Fabric (e.g., Azure Cosmos DB with vector search)

4. Fabric and Copilot Studio Integration

Build Copilot experiences connected to Fabric:

# Copilot Studio integration is configured through the Power Platform portal
# For programmatic access to Fabric data from custom copilots, use REST APIs

from azure.identity import DefaultAzureCredential
import requests

credential = DefaultAzureCredential()
token = credential.get_token("https://api.fabric.microsoft.com/.default").token

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

workspace_id = "your-workspace-id"

# List available artifacts that can be connected to Copilot Studio
artifacts_url = f"https://api.fabric.microsoft.com/v1/workspaces/{workspace_id}/items"
response = requests.get(artifacts_url, headers=headers)
items = response.json().get("value", [])

# Filter for lakehouses and semantic models
for item in items:
    if item["type"] in ["Lakehouse", "SemanticModel"]:
        print(f"Available for Copilot: {item['displayName']} ({item['type']})")

# Note: Copilot Studio configuration is done in the Power Platform portal:
# 1. Create a new copilot in Copilot Studio
# 2. Add a Fabric connector as a data source
# 3. Configure the lakehouse and semantic model connections
# 4. Define topics and conversation flows
# 5. Deploy to Teams or other channels

# For custom integrations, use Semantic Link to query data
import sempy.fabric as fabric

# Query semantic model for copilot responses
df = fabric.evaluate_dax(
    dataset="SalesModel",
    dax_string="EVALUATE SUMMARIZE(Sales, Sales[Region], 'Total', SUM(Sales[Amount]))"
)

5. Eventstream Enhancements

Enhanced real-time streaming capabilities:

# Eventstreams are configured via Fabric portal or REST API
from azure.identity import DefaultAzureCredential
import requests

credential = DefaultAzureCredential()
token = credential.get_token("https://api.fabric.microsoft.com/.default").token

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

workspace_id = "your-workspace-id"
base_url = f"https://api.fabric.microsoft.com/v1/workspaces/{workspace_id}"

# Create an eventstream item
eventstream_payload = {
    "displayName": "RealTimeAnalytics",
    "type": "Eventstream",
    "description": "Real-time analytics with Event Hubs source"
}

response = requests.post(f"{base_url}/items", headers=headers, json=eventstream_payload)
eventstream = response.json()
print(f"Created Eventstream: {eventstream.get('id')}")

# Note: Eventstream configuration (sources, processors, destinations) is done
# through the Fabric portal UI or by updating the eventstream definition.
#
# Key capabilities available in the portal:
# - Source connectors: Event Hubs, Kafka, Custom endpoints
# - Processors: Filter, Aggregate, Transform, AI enrichment
# - Destinations: Lakehouse, KQL Database, Reflex triggers
#
# For AI enrichment in streaming, you can use Spark Structured Streaming:

from pyspark.sql import SparkSession
from pyspark.sql.functions import udf, col
from pyspark.sql.types import StringType

spark = SparkSession.builder.getOrCreate()

# Read from Event Hubs using Spark Structured Streaming
eh_conf = {
    "eventhubs.connectionString": "<connection-string>"
}

stream_df = spark.readStream \
    .format("eventhubs") \
    .options(**eh_conf) \
    .load()

# Write enriched events to Lakehouse Delta table
query = stream_df \
    .writeStream \
    .format("delta") \
    .outputMode("append") \
    .option("checkpointLocation", "Files/checkpoints/events") \
    .toTable("enriched_events")

6. Real-Time AI Dashboards

Build AI-powered real-time dashboards:

# Real-Time dashboards in Fabric use KQL queries and are configured via portal
# For programmatic dashboard creation, use the Fabric REST API

from azure.identity import DefaultAzureCredential
import requests

credential = DefaultAzureCredential()
token = credential.get_token("https://api.fabric.microsoft.com/.default").token

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

workspace_id = "your-workspace-id"
base_url = f"https://api.fabric.microsoft.com/v1/workspaces/{workspace_id}"

# Create a Real-Time Dashboard item
dashboard_payload = {
    "displayName": "Operations Command Center",
    "type": "Dashboard",
    "description": "Real-time operations monitoring"
}

response = requests.post(f"{base_url}/items", headers=headers, json=dashboard_payload)
dashboard = response.json()
print(f"Created Dashboard: {dashboard.get('id')}")

# Real-time visualizations are powered by KQL queries
# Example KQL queries for dashboard tiles:

# Current status summary
kql_summary = """
operational_metrics
| where timestamp > ago(5m)
| summarize
    avg_latency = avg(latency_ms),
    error_rate = countif(status == 'error') * 100.0 / count(),
    requests_per_sec = count() / 300.0
| project avg_latency, error_rate, requests_per_sec
"""

# Active anomalies
kql_anomalies = """
detected_anomalies
| where timestamp > ago(1h)
| where severity in ('high', 'critical')
| project timestamp, metric_name, actual_value, expected_value, severity
| order by timestamp desc
"""

# For AI-powered summaries, use Azure OpenAI with query results
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint="https://your-resource.openai.azure.com",
    api_version="2024-02-01",
    azure_ad_token=token
)

# Generate AI summary from metrics
# (metrics would come from KQL query results)

Migration and Upgrade Path

For existing Fabric users:

# Workspace administration via Fabric REST API
from azure.identity import DefaultAzureCredential
import requests

credential = DefaultAzureCredential()
token = credential.get_token("https://api.fabric.microsoft.com/.default").token

headers = {
    "Authorization": f"Bearer {token}",
    "Content-Type": "application/json"
}

workspace_id = "your-workspace-id"

# List all items in workspace
items_url = f"https://api.fabric.microsoft.com/v1/workspaces/{workspace_id}/items"
response = requests.get(items_url, headers=headers)
items = response.json().get("value", [])

# Inventory of workspace assets
item_counts = {}
for item in items:
    item_type = item["type"]
    item_counts[item_type] = item_counts.get(item_type, 0) + 1

print("Workspace asset inventory:")
for item_type, count in item_counts.items():
    print(f"  {item_type}: {count}")

# Check workspace capacity and settings
workspace_url = f"https://api.fabric.microsoft.com/v1/workspaces/{workspace_id}"
response = requests.get(workspace_url, headers=headers)
workspace = response.json()

print(f"\nWorkspace: {workspace.get('displayName')}")
print(f"Capacity: {workspace.get('capacityId')}")

# Note: Feature enablement and migrations are typically managed through:
# - Fabric Admin Portal (admin.fabric.microsoft.com)
# - Tenant settings in the Admin portal
# - Capacity settings for specific workloads
#
# For programmatic tenant administration, use the Admin REST APIs:
# https://learn.microsoft.com/en-us/rest/api/fabric/admin

What This Means for Data Teams

Natural language becomes the interface - AI Skills make data accessible to more users
Real-time AI is native - No more separate infrastructure for streaming AI
Copilot integration is seamless - Build conversational experiences on your data
Analytics agents automate routine tasks - Focus on insights, not data wrangling

The Fabric platform continues to mature into a comprehensive, AI-native analytics solution.