September 3, 2020 1 min read

Azure DevOps YAML Pipelines for Data Projects

YAML pipelines in Azure DevOps bring pipeline-as-code to your data projects. Version control your CI/CD alongside your code.

Basic Pipeline Structure

trigger:
  branches:
    include:
      - main
      - develop

pool:
  vmImage: 'ubuntu-latest'

variables:
  pythonVersion: '3.8'
  databricksHost: $(DATABRICKS_HOST)
  databricksToken: $(DATABRICKS_TOKEN)

stages:
  - stage: Test
    jobs:
      - job: UnitTests
        steps:
          - task: UsePythonVersion@0
            inputs:
              versionSpec: '$(pythonVersion)'

          - script: |
              pip install -r requirements.txt
              pip install pytest pytest-cov
            displayName: 'Install dependencies'

          - script: |
              pytest tests/ --cov=src --cov-report=xml
            displayName: 'Run tests'

          - task: PublishCodeCoverageResults@1
            inputs:
              codeCoverageTool: 'Cobertura'
              summaryFileLocation: 'coverage.xml'

  - stage: Deploy
    dependsOn: Test
    condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
    jobs:
      - job: DeployNotebooks
        steps:
          - script: |
              pip install databricks-cli
              databricks configure --token <<< "$DATABRICKS_TOKEN"
              databricks workspace import_dir ./notebooks /Shared/project --overwrite
            displayName: 'Deploy to Databricks'
            env:
              DATABRICKS_HOST: $(databricksHost)
              DATABRICKS_TOKEN: $(databricksToken)

Key Patterns

Separate stages for test and deploy
Conditional deployment only from main branch
Secret variables stored in pipeline variables, not YAML
Artifact publishing for audit trails

Pipeline-as-code means your deployment process is reviewed, versioned, and reproducible.