Back to Blog
5 min read

Developer Experience for Data and AI Teams

Developer experience (DevEx) for data and AI teams has become a competitive advantage. Teams with great DevEx ship faster with fewer errors. Let’s explore how to build excellent developer experience for data and AI work.

The DevEx Pyramid

              ┌─────────────┐
              │   Delight   │  Nice-to-haves that make work enjoyable
              ├─────────────┤
              │ Productivity│  Tools that accelerate work
              ├─────────────┤
              │  Usability  │  Easy to learn and use
              ├─────────────┤
              │Functionality│  Does what's needed
              ├─────────────┤
              │ Reliability │  Works consistently
              └─────────────┘

Key DevEx Elements

1. Development Environment

# Modern data development environment setup
dev_environment = {
    "local_development": {
        "ide": "VS Code with extensions",
        "extensions": [
            "Python",
            "Jupyter",
            "Azure Tools",
            "Power BI Tools",
            "SQL Tools",
            "Git Lens",
            "Copilot"
        ],
        "containers": "Dev containers for consistent environments"
    },

    "cloud_development": {
        "notebooks": "Fabric Notebooks / Databricks",
        "sql": "Fabric SQL / Synapse",
        "spark": "Managed Spark pools"
    },

    "local_testing": {
        "databases": "Docker containers",
        "storage": "Azurite emulator",
        "data": "Sample data generators"
    }
}

# devcontainer.json for data projects
devcontainer_config = {
    "name": "Data Engineering Environment",
    "image": "mcr.microsoft.com/devcontainers/python:3.11",
    "features": {
        "ghcr.io/devcontainers/features/azure-cli:1": {},
        "ghcr.io/devcontainers/features/docker-in-docker:2": {}
    },
    "customizations": {
        "vscode": {
            "extensions": [
                "ms-python.python",
                "ms-toolsai.jupyter",
                "ms-azuretools.vscode-azure",
                "ms-mssql.mssql",
                "github.copilot"
            ]
        }
    },
    "postCreateCommand": "pip install -r requirements-dev.txt"
}

2. Fast Feedback Loops

class FastFeedbackPipeline:
    """Enable fast feedback for data development."""

    feedback_stages = {
        "instant": {
            "time": "< 1 second",
            "checks": [
                "Syntax highlighting",
                "Linting (ruff, sqlfluff)",
                "Type checking",
                "Auto-formatting"
            ]
        },

        "fast": {
            "time": "< 30 seconds",
            "checks": [
                "Unit tests",
                "Schema validation",
                "Sample data tests"
            ]
        },

        "medium": {
            "time": "< 5 minutes",
            "checks": [
                "Integration tests",
                "Data quality tests",
                "Performance benchmarks"
            ]
        },

        "thorough": {
            "time": "< 30 minutes",
            "checks": [
                "Full regression tests",
                "End-to-end tests",
                "Security scans"
            ]
        }
    }

    def setup_pre_commit_hooks(self):
        """Configure pre-commit for instant feedback."""
        return """
# .pre-commit-config.yaml
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.1.6
    hooks:
      - id: ruff
        args: [--fix]
      - id: ruff-format

  - repo: https://github.com/sqlfluff/sqlfluff
    rev: 2.3.5
    hooks:
      - id: sqlfluff-lint
        args: [--dialect, sparksql]

  - repo: local
    hooks:
      - id: pytest-quick
        name: Quick Tests
        entry: pytest tests/unit -x --timeout=10
        language: system
        pass_filenames: false
        """

    def setup_watch_mode(self):
        """Configure watch mode for continuous feedback."""
        return """
# Using pytest-watch for continuous testing
ptw --runner "pytest tests/unit -x" --ignore data/

# Using dbt for continuous SQL validation
dbt compile --watch

# Using jupyter for interactive development
jupyter lab --watch
        """

3. Documentation and Discovery

class DocumentationSystem:
    """Make information discoverable."""

    documentation_layers = {
        "inline": {
            "purpose": "Immediate context",
            "format": "Code comments, docstrings",
            "example": "Function and class documentation"
        },

        "readme": {
            "purpose": "Project overview",
            "format": "README.md in each folder",
            "example": "How to run, configure, deploy"
        },

        "guides": {
            "purpose": "How-to instructions",
            "format": "Markdown in docs/ folder",
            "example": "Step-by-step tutorials"
        },

        "reference": {
            "purpose": "Complete API/schema docs",
            "format": "Auto-generated",
            "example": "dbt docs, API specs"
        },

        "portal": {
            "purpose": "Central discovery",
            "format": "Developer portal",
            "example": "Backstage, internal wiki"
        }
    }

    def generate_documentation(self, project_path: str):
        """Generate documentation automatically."""

        # Generate Python docs
        os.system(f"pdoc {project_path}/src -o docs/api")

        # Generate dbt docs
        os.system("dbt docs generate")

        # Generate data catalog
        self.generate_data_catalog(project_path)

        # Generate architecture diagrams
        self.generate_diagrams(project_path)

    def create_data_catalog_entry(self, table: dict) -> dict:
        """Create data catalog documentation."""
        return {
            "name": table["name"],
            "description": table["description"],
            "schema": table["schema"],
            "owner": table["owner"],
            "domain": table["domain"],
            "quality_score": table["quality_score"],
            "sample_queries": self.generate_sample_queries(table),
            "lineage": self.get_lineage(table),
            "usage_examples": self.get_usage_examples(table)
        }

4. Templates and Scaffolding

class ProjectScaffolder:
    """Scaffold new projects quickly."""

    templates = {
        "data_pipeline": {
            "description": "Standard data pipeline project",
            "structure": """
            {project_name}/
            ├── src/
            │   ├── extract/
            │   ├── transform/
            │   └── load/
            ├── tests/
            │   ├── unit/
            │   └── integration/
            ├── config/
            │   ├── dev.yml
            │   └── prod.yml
            ├── docs/
            ├── .github/workflows/
            ├── pyproject.toml
            └── README.md
            """,
            "includes": [
                "CI/CD pipeline",
                "Testing framework",
                "Logging setup",
                "Config management"
            ]
        },

        "ml_project": {
            "description": "ML project with MLOps",
            "structure": """
            {project_name}/
            ├── src/
            │   ├── features/
            │   ├── models/
            │   └── evaluation/
            ├── notebooks/
            ├── tests/
            ├── mlflow/
            ├── config/
            └── README.md
            """
        },

        "ai_application": {
            "description": "AI application (RAG, agents)",
            "structure": """
            {project_name}/
            ├── src/
            │   ├── agents/
            │   ├── prompts/
            │   └── tools/
            ├── tests/
            ├── evaluations/
            ├── config/
            └── README.md
            """
        }
    }

    def scaffold(self, template: str, project_name: str, config: dict):
        """Create new project from template."""
        template_def = self.templates[template]

        # Create directory structure
        self.create_structure(template_def["structure"], project_name)

        # Copy template files
        self.copy_template_files(template, project_name)

        # Customize for project
        self.customize_files(project_name, config)

        # Initialize git
        self.init_git(project_name)

        # Install dependencies
        self.install_dependencies(project_name)

        return f"Project {project_name} created from {template} template"

5. AI-Assisted Development

class AIAssistedDevelopment:
    """Integrate AI assistance into workflow."""

    capabilities = {
        "code_completion": {
            "tool": "GitHub Copilot",
            "use_cases": [
                "SQL query completion",
                "Python function generation",
                "Test case generation"
            ]
        },

        "code_explanation": {
            "tool": "Copilot Chat / Claude",
            "use_cases": [
                "Explain complex SQL",
                "Debug errors",
                "Understand legacy code"
            ]
        },

        "documentation_generation": {
            "tool": "AI assistants",
            "use_cases": [
                "Generate docstrings",
                "Create README content",
                "Write API documentation"
            ]
        },

        "review_assistance": {
            "tool": "AI code review",
            "use_cases": [
                "Suggest improvements",
                "Identify bugs",
                "Check best practices"
            ]
        }
    }

    # Example: AI-assisted SQL development
    def ai_sql_assistant(self, natural_language_query: str, schema: dict) -> str:
        """Generate SQL from natural language."""
        prompt = f"""Generate SQL for this request:
        Request: {natural_language_query}
        Schema: {schema}
        Rules: Use Spark SQL syntax, include comments"""

        return self.ai_client.generate(prompt)

Measuring Developer Experience

devex_metrics = {
    "productivity": {
        "time_to_first_commit": "Time from project start to first commit",
        "deployment_frequency": "How often code reaches production",
        "lead_time": "Commit to production time",
        "change_failure_rate": "% of deployments causing issues"
    },

    "satisfaction": {
        "developer_nps": "Would you recommend this platform?",
        "friction_score": "How many blockers encountered?",
        "toil_time": "Time spent on non-productive tasks"
    },

    "quality": {
        "test_coverage": "% of code tested",
        "documentation_coverage": "% of code documented",
        "incident_rate": "Production issues per deployment"
    }
}

# Target benchmarks
devex_targets = {
    "time_to_first_commit": "< 1 day",
    "deployment_frequency": "Daily or more",
    "lead_time": "< 1 day",
    "change_failure_rate": "< 5%",
    "developer_nps": "> 50"
}

Great developer experience is an investment that pays dividends in productivity, quality, and retention. Measure it and improve it continuously.

Resources

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.