Microsoft Graph Data Connect: Bulk Access to Microsoft 365 Data
I wrote “Microsoft Graph Data Connect: Bulk Access to Microsoft 365 Data” to share practical, production-minded guidance on this topic.
The Challenge with Graph API at Scale
Using the standard Graph API for analytics has limitations:
- Rate limiting kicks in quickly
- Pagination through millions of records is slow
- Token management at scale is complex
- Near-real-time patterns don’t fit batch analytics
Graph Data Connect solves this by delivering bulk extracts of Microsoft 365 data directly to Azure Data Factory.
How It Works
- Configure a data pipeline in Azure Data Factory
- Request specific datasets (emails, calendar, files metadata)
- A Microsoft 365 admin approves the request
- Data is delivered to your Azure storage
The data lands in Azure Storage as JSON files, ready for processing with Synapse, Databricks, or any data platform.
Setting Up a Pipeline
First, register an app in Azure AD with the Graph Data Connect permissions:
{
"name": "graph-data-connect-app",
"requiredResourceAccess": [
{
"resourceAppId": "00000003-0000-0000-c000-000000000000",
"resourceAccess": [
{
"id": "810c84a8-4a9e-49e6-bf7d-12d183f40d01",
"type": "Role"
}
]
}
]
}
Then create the Azure Data Factory pipeline:
{
"name": "CopyEmailMetadata",
"properties": {
"activities": [
{
"name": "CopyFromOffice365",
"type": "Copy",
"inputs": [
{
"referenceName": "Office365EmailDataset",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "AzureBlobOutput",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "Office365Source",
"dateFilterColumn": "receivedDateTime",
"startTime": "2021-04-01T00:00:00Z",
"endTime": "2021-05-01T00:00:00Z",
"userScopeFilterUri": "https://graph.microsoft.com/v1.0/groups/{group-id}/members"
},
"sink": {
"type": "BlobSink"
}
}
}
]
}
}
Available Datasets
Graph Data Connect provides access to:
- Message headers and bodies
- Sent and received metadata
- Attachment information (not content)
Calendar
- Meeting metadata
- Attendee information
- Recurring event patterns
People
- Organization hierarchy
- Contact relationships
- Collaboration patterns
Files (OneDrive/SharePoint)
- File metadata
- Sharing information
- Activity signals
Data Processing with Synapse
Once data lands in storage, process it with Synapse:
# Synapse Spark notebook
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, explode, to_date
# Read email metadata from blob storage
emails_df = spark.read.json("abfss://m365data@yourstorage.dfs.core.windows.net/emails/")
# Parse the nested JSON structure
parsed_df = emails_df.select(
col("id"),
col("subject"),
col("receivedDateTime"),
col("sender.emailAddress.address").alias("senderEmail"),
col("toRecipients").alias("recipients"),
col("importance")
)
# Calculate email volume by sender
email_volume = parsed_df.groupBy("senderEmail") \
.count() \
.orderBy(col("count").desc())
# Analyze patterns by time
daily_volume = parsed_df \
.withColumn("date", to_date("receivedDateTime")) \
.groupBy("date") \
.count() \
.orderBy("date")
# Save to Synapse SQL pool for BI
email_volume.write \
.format("com.databricks.spark.sqldw") \
.option("url", "jdbc:sqlserver://synapse.sql.azuresynapse.net:1433") \
.option("dbtable", "analytics.email_volume") \
.option("tempDir", "abfss://temp@storage.dfs.core.windows.net/tempdata") \
.mode("overwrite") \
.save()
Privacy and Compliance
Graph Data Connect includes privacy controls:
Admin Consent Every pipeline requires approval from a Microsoft 365 admin. They see exactly what data is being requested and can approve or deny.
User Scoping You can limit data extraction to specific groups:
{
"userScopeFilterUri": "https://graph.microsoft.com/v1.0/groups/{group-id}/members"
}
Data Columns You can select only the columns you need, minimizing data exposure:
{
"outputColumns": [
{"name": "id"},
{"name": "subject"},
{"name": "receivedDateTime"},
{"name": "importance"}
]
}
Pseudonymization Optionally hash user identifiers:
{
"allowedGroups": ["analytics-team-group-id"],
"userEmailObfuscation": "hashUserEmail"
}
Use Cases
Workplace Analytics Alternative Build custom analytics dashboards showing collaboration patterns:
- Meeting time per team
- Email volume trends
- Cross-team collaboration
Compliance Monitoring Extract communication metadata for compliance reviews:
- External communication patterns
- Sensitive content flags
- Policy violation detection
Migration Planning Understand usage patterns before migrations:
- Active vs inactive mailboxes
- Storage consumption
- User activity patterns
Cost Considerations
Graph Data Connect charges per record extracted:
- Messages: $0.000025 per message
- Calendar events: $0.000025 per event
- Files: $0.00001 per file
For a 10,000 user organization with 1M messages per month, expect around $25/month for message extraction alone.
Add Azure Data Factory and storage costs on top.
Approval Workflow
The admin approval flow is critical:
- Pipeline triggers
- Request appears in Microsoft 365 admin center
- Admin reviews requested data scope
- Admin approves or denies
- If approved, data extraction begins
# Check pending requests via PowerShell
Connect-ExchangeOnline
Get-DataPolicyOperationAuditInfo -OperationType ApproveRequest
Incremental Extraction
For ongoing analytics, implement incremental loads:
{
"typeProperties": {
"source": {
"type": "Office365Source",
"dateFilterColumn": "receivedDateTime",
"startTime": {
"value": "@pipeline().parameters.lastRunTime",
"type": "Expression"
},
"endTime": {
"value": "@utcnow()",
"type": "Expression"
}
}
}
}
Track the last successful run time and use it for the next extraction.
Getting Started
- Request the Graph Data Connect preview access
- Create an Azure AD app with appropriate permissions
- Set up Azure Data Factory and storage
- Work with your Microsoft 365 admin on approval workflows
- Start with a small pilot group before scaling
Graph Data Connect bridges the gap between Microsoft 365’s rich data and enterprise analytics platforms. It’s a powerful tool when you need insights across your organization.
Resources
- Graph Data Connect Documentation
- Azure Data Factory Office 365 Connector
- Data Privacy and Compliance\n\n## Takeaways\n\nAdd a concise, personal takeaway and recommended next steps here.\n