1 min read
Azure DevOps YAML Pipelines for Data Projects
YAML pipelines in Azure DevOps bring pipeline-as-code to your data projects. Version control your CI/CD alongside your code.
Basic Pipeline Structure
trigger:
branches:
include:
- main
- develop
pool:
vmImage: 'ubuntu-latest'
variables:
pythonVersion: '3.8'
databricksHost: $(DATABRICKS_HOST)
databricksToken: $(DATABRICKS_TOKEN)
stages:
- stage: Test
jobs:
- job: UnitTests
steps:
- task: UsePythonVersion@0
inputs:
versionSpec: '$(pythonVersion)'
- script: |
pip install -r requirements.txt
pip install pytest pytest-cov
displayName: 'Install dependencies'
- script: |
pytest tests/ --cov=src --cov-report=xml
displayName: 'Run tests'
- task: PublishCodeCoverageResults@1
inputs:
codeCoverageTool: 'Cobertura'
summaryFileLocation: 'coverage.xml'
- stage: Deploy
dependsOn: Test
condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
jobs:
- job: DeployNotebooks
steps:
- script: |
pip install databricks-cli
databricks configure --token <<< "$DATABRICKS_TOKEN"
databricks workspace import_dir ./notebooks /Shared/project --overwrite
displayName: 'Deploy to Databricks'
env:
DATABRICKS_HOST: $(databricksHost)
DATABRICKS_TOKEN: $(databricksToken)
Key Patterns
- Separate stages for test and deploy
- Conditional deployment only from main branch
- Secret variables stored in pipeline variables, not YAML
- Artifact publishing for audit trails
Pipeline-as-code means your deployment process is reviewed, versioned, and reproducible.