Back to Blog
1 min read

Azure DevOps YAML Pipelines for Data Projects

YAML pipelines in Azure DevOps bring pipeline-as-code to your data projects. Version control your CI/CD alongside your code.

Basic Pipeline Structure

trigger:
  branches:
    include:
      - main
      - develop

pool:
  vmImage: 'ubuntu-latest'

variables:
  pythonVersion: '3.8'
  databricksHost: $(DATABRICKS_HOST)
  databricksToken: $(DATABRICKS_TOKEN)

stages:
  - stage: Test
    jobs:
      - job: UnitTests
        steps:
          - task: UsePythonVersion@0
            inputs:
              versionSpec: '$(pythonVersion)'

          - script: |
              pip install -r requirements.txt
              pip install pytest pytest-cov
            displayName: 'Install dependencies'

          - script: |
              pytest tests/ --cov=src --cov-report=xml
            displayName: 'Run tests'

          - task: PublishCodeCoverageResults@1
            inputs:
              codeCoverageTool: 'Cobertura'
              summaryFileLocation: 'coverage.xml'

  - stage: Deploy
    dependsOn: Test
    condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
    jobs:
      - job: DeployNotebooks
        steps:
          - script: |
              pip install databricks-cli
              databricks configure --token <<< "$DATABRICKS_TOKEN"
              databricks workspace import_dir ./notebooks /Shared/project --overwrite
            displayName: 'Deploy to Databricks'
            env:
              DATABRICKS_HOST: $(databricksHost)
              DATABRICKS_TOKEN: $(databricksToken)

Key Patterns

  1. Separate stages for test and deploy
  2. Conditional deployment only from main branch
  3. Secret variables stored in pipeline variables, not YAML
  4. Artifact publishing for audit trails

Pipeline-as-code means your deployment process is reviewed, versioned, and reproducible.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.