6 min read
Fabric Git Integration: Version Control for Your Data Platform
Fabric Git integration enables version control for your data platform artifacts. Today, I will show you how to set up and use Git with Fabric workspaces.
Git Integration Overview
┌─────────────────────────────────────────────────────┐
│ Fabric Git Integration │
├─────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────┐│
│ │ Fabric Workspace ││
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ││
│ │ │Lakehouse│ │Notebook │ │ Pipeline│ ││
│ │ └─────────┘ └─────────┘ └─────────┘ ││
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ││
│ │ │Dataflow │ │Semantic │ │ Report │ ││
│ │ │ │ │ Model │ │ │ ││
│ │ └─────────┘ └─────────┘ └─────────┘ ││
│ └──────────────────┬──────────────────────────────┘│
│ │ │
│ │ Sync │
│ ▼ │
│ ┌─────────────────────────────────────────────────┐│
│ │ Git Repository ││
│ │ (Azure DevOps or GitHub) ││
│ │ ││
│ │ /workspace/ ││
│ │ /lakehouse.Lakehouse/ ││
│ │ /notebook.Notebook/ ││
│ │ /pipeline.DataPipeline/ ││
│ │ /dataflow.Dataflow/ ││
│ └─────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────┘
Setting Up Git Integration
Connect to Azure DevOps
# Steps in Fabric Portal:
# 1. Workspace settings > Git integration
# 2. Select Git provider (Azure DevOps or GitHub)
# 3. Authenticate and select organization
# 4. Select repository and branch
# 5. Choose folder for workspace content
git_config = {
"provider": "AzureDevOps",
"organization": "myorg",
"project": "DataPlatform",
"repository": "fabric-workspace",
"branch": "main",
"folder": "/dev-workspace"
}
Connect to GitHub
# For GitHub:
# 1. Workspace settings > Git integration
# 2. Select GitHub
# 3. Authenticate with GitHub
# 4. Select repository
github_config = {
"provider": "GitHub",
"owner": "myorg",
"repository": "fabric-workspace",
"branch": "main",
"folder": "/workspaces/development"
}
Supported Item Types
# Items that can be version controlled
supported_items = {
"fully_supported": [
"Notebooks",
"Spark Job Definitions",
"Pipelines",
"Dataflows Gen2",
"Semantic Models (TMDL format)",
"Reports (PBIR format)",
"Paginated Reports",
"Environments"
],
"metadata_only": [
"Lakehouses (definition, not data)",
"Warehouses (definition, not data)",
"KQL Databases",
"Eventstreams"
],
"not_supported": [
"Data content in tables",
"Files in Lakehouse Files folder",
"Actual data"
]
}
Working with Git
Commit Changes
# After making changes in Fabric:
# 1. Click "Source control" in workspace
# 2. Review changes
# 3. Select items to commit
# 4. Enter commit message
# 5. Commit
# Changes are pushed to the connected Git branch
Pull Changes
# To get changes from Git:
# 1. Click "Source control"
# 2. Click "Update all"
# 3. Review incoming changes
# 4. Confirm update
# Workspace items are updated from Git
Branching Strategy
┌─────────────────────────────────────────────────────┐
│ Recommended Branch Strategy │
├─────────────────────────────────────────────────────┤
│ │
│ main (Production) │
│ │ │
│ ├── release/v1.0 │
│ │ │
│ └── develop │
│ │ │
│ ├── feature/new-pipeline │
│ │ │
│ ├── feature/lakehouse-updates │
│ │ │
│ └── bugfix/fix-dataflow │
│ │
│ Workspaces: │
│ - Dev workspace → develop branch │
│ - Test workspace → release branch │
│ - Prod workspace → main branch │
│ │
└─────────────────────────────────────────────────────┘
File Structure in Git
fabric-workspace/
├── workspace/
│ ├── SalesLakehouse.Lakehouse/
│ │ └── .platform # Lakehouse definition
│ │
│ ├── TransformSales.Notebook/
│ │ ├── .platform
│ │ └── notebook-content.py # Notebook code
│ │
│ ├── DailySalesPipeline.DataPipeline/
│ │ ├── .platform
│ │ └── pipeline-content.json # Pipeline definition
│ │
│ ├── SalesDataflow.Dataflow/
│ │ ├── .platform
│ │ └── dataflow-content.json # Dataflow M code
│ │
│ ├── SalesModel.SemanticModel/
│ │ ├── .platform
│ │ ├── model.tmdl # Model definition
│ │ ├── tables/
│ │ │ ├── Sales.tmdl
│ │ │ └── Customer.tmdl
│ │ ├── relationships.tmdl
│ │ └── cultures/
│ │ └── en-US.tmdl
│ │
│ └── SalesReport.Report/
│ ├── .platform
│ └── report.json # Report definition
│
└── .gitignore
CI/CD with Azure DevOps
Build Pipeline
# azure-pipelines.yml
trigger:
branches:
include:
- main
- develop
pool:
vmImage: 'ubuntu-latest'
stages:
- stage: Validate
jobs:
- job: ValidateFabricArtifacts
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '3.10'
- script: |
pip install jsonschema pyyaml
displayName: 'Install validation tools'
- script: |
python scripts/validate_notebooks.py
python scripts/validate_pipelines.py
displayName: 'Validate Fabric artifacts'
- stage: Deploy_Dev
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/develop'))
jobs:
- deployment: DeployToDev
environment: 'Development'
strategy:
runOnce:
deploy:
steps:
- script: |
echo "Dev workspace syncs automatically via Git"
displayName: 'Dev deployment note'
- stage: Deploy_Prod
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
jobs:
- deployment: DeployToProd
environment: 'Production'
strategy:
runOnce:
deploy:
steps:
- task: PowerShell@2
inputs:
targetType: 'inline'
script: |
# Use Fabric APIs to trigger deployment
# Or use deployment pipelines
displayName: 'Deploy to Production'
Validation Script
# scripts/validate_notebooks.py
import json
import os
import sys
def validate_notebook(path):
"""Validate notebook structure"""
errors = []
if not os.path.exists(path):
errors.append(f"Notebook path not found: {path}")
return errors
# Check for required files
platform_file = os.path.join(path, ".platform")
if not os.path.exists(platform_file):
errors.append(f"Missing .platform file in {path}")
# Validate notebook content
for file in os.listdir(path):
if file.endswith('.py'):
notebook_path = os.path.join(path, file)
with open(notebook_path, 'r') as f:
content = f.read()
# Check for common issues
if 'hardcoded_password' in content.lower():
errors.append(f"Potential hardcoded credential in {notebook_path}")
if 'localhost' in content and 'test' not in path.lower():
errors.append(f"Localhost reference in {notebook_path}")
return errors
def main():
workspace_path = "workspace"
all_errors = []
for item in os.listdir(workspace_path):
if item.endswith('.Notebook'):
item_path = os.path.join(workspace_path, item)
errors = validate_notebook(item_path)
all_errors.extend(errors)
if all_errors:
print("Validation errors found:")
for error in all_errors:
print(f" - {error}")
sys.exit(1)
else:
print("All notebooks validated successfully")
sys.exit(0)
if __name__ == "__main__":
main()
Best Practices
git_best_practices = {
"branching": [
"Use feature branches for development",
"Protect main/release branches",
"Require pull request reviews",
"Use meaningful branch names"
],
"commits": [
"Write descriptive commit messages",
"Commit related changes together",
"Don't commit sensitive data",
"Review changes before committing"
],
"workflow": [
"Sync frequently to avoid conflicts",
"Test changes before pushing",
"Use deployment pipelines for promotion",
"Document workspace dependencies"
],
"collaboration": [
"Establish team conventions",
"Use consistent naming",
"Review pull requests thoroughly",
"Communicate workspace changes"
]
}
Git integration enables proper version control and CI/CD for Fabric. Tomorrow, I will cover Fabric Deployment Pipelines.