Skip to content
Back to Blog
2 min read

Azure ML Designer - Visual Machine Learning Pipeline Development

Azure ML Designer is the no-code visual pipeline builder for ML, and its place in the toolbox is narrower than the marketing suggests. It’s genuinely useful for teams who need to prototype a classification or regression pipeline and communicate it visually to stakeholders who don’t read Python—the drag-and-drop canvas maps cleanly to a whiteboard conversation. For production ML, most data scientists I’ve worked with prefer the SDK or notebooks because Designer’s component library is constrained and debugging a failed designer pipeline is more opaque than debugging notebook code. That said, the custom component integration has improved, and if your organisation has governance reasons to limit arbitrary Python execution, Designer enforces that boundary.

Understanding Azure ML Designer

Designer offers:

  • Visual pipeline building - Drag-and-drop components
  • Pre-built modules - Data processing, training, evaluation
  • Custom components - Extend with your own code
  • Real-time endpoints - One-click deployment
  • Batch endpoints - Scheduled scoring pipelines

Getting Started

Access Designer

  1. Navigate to Azure ML Studio
  2. Click “Designer” in the left menu
  3. Create a new pipeline or use a sample

Key Concepts

Pipeline = Collection of connected modules
Module = Self-contained processing step
Dataset = Input data for the pipeline
Compute = Resources for running the pipeline

Building a Classification Pipeline

Step 1: Add Dataset

  1. Drag Dataset module from the left panel
  2. Configure:
    • Select registered dataset, OR
    • Use sample dataset (e.g., Adult Census Income)

Step 2: Data Preprocessing

Recommended modules:
  - Select Columns in Dataset:
      Purpose: Choose relevant features
      Configuration: Select by name or by type

  - Clean Missing Data:
      Purpose: Handle null values
      Options:
        - Remove rows
        - Replace with mean/median/mode
        - Replace with custom value

  - Edit Metadata:
      Purpose: Set column types and categories
      Configuration: Mark categorical columns

  - Normalize Data:
      Purpose: Scale numeric features
      Methods:
        - MinMax
        - ZScore
        - Logistic

Step 3: Split Data

Module: Split Data
Configuration:
  Splitting mode: Split Rows
  Fraction of rows: 0.7
  Random seed: 42
  Stratified split: Yes (for classification)

Step 4: Train Model

Modules to connect:
  1. Algorithm module:
     - Two-Class Boosted Decision Tree
     - Two-Class Logistic Regression
     - Two-Class Neural Network

  2. Train Model:
     Connect:
       - Algorithm to left input
       - Training data to right input
     Configuration:
       - Label column: target

Step 5: Score and Evaluate

Score Model:
  Connect:
    - Trained model to left input
    - Test data to right input

Evaluate Model:
  Connect:
    - Scored data to input
  Outputs:
    - Accuracy metrics
    - ROC curve
    - Confusion matrix

Complete Pipeline Example

[Dataset: Customer Data]
        │
        ▼
[Select Columns in Dataset]
        │
        ▼
[Clean Missing Data]
        │
        ▼
[Edit Metadata] ───── Mark categorical columns
        │
        ▼
[Normalize Data]
        │
        ▼
[Split Data] ─────── 70/30 split
    │       │
    │       └─────────────────┐
    ▼                         ▼
[Train Model] ◄── [Two-Class Boosted Decision Tree]
    │
    ▼
[Score Model] ◄───── Test Data
    │
    ▼
[Evaluate Model]

Custom Python Scripts

Execute Python Script Module

# dataframe1: Input dataset 1
# dataframe2: Input dataset 2 (optional)

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder

def azureml_main(dataframe1=None, dataframe2=None):
    """
    Custom preprocessing function
    """
    df = dataframe1.copy()

    # Feature engineering
    df['age_group'] = pd.cut(df['age'],
                             bins=[0, 25, 45, 65, 100],
                             labels=['young', 'middle', 'senior', 'elderly'])

    # Encode categorical variables
    le = LabelEncoder()
    for col in df.select_dtypes(include=['object']).columns:
        df[col] = le.fit_transform(df[col].astype(str))

    # Create interaction features
    df['income_per_hour'] = df['income'] / (df['hours_per_week'] + 1)

    # Drop original columns if needed
    df = df.drop(['redundant_column'], axis=1, errors='ignore')

    return df,  # Return tuple of DataFrames

Execute R Script Module

# dataframe1: Input dataset 1
# dataframe2: Input dataset 2 (optional)

azureml_main <- function(dataframe1, dataframe2) {
    # Load libraries
    library(dplyr)
    library(tidyr)

    # Custom preprocessing
    df <- dataframe1 %>%
        mutate(
            age_squared = age^2,
            log_income = log(income + 1)
        ) %>%
        filter(!is.na(target))

    # Return result
    return(list(df))
}

Hyperparameter Tuning

Using Tune Model Hyperparameters

Module: Tune Model Hyperparameters
Configuration:
  Parameter sweeping mode: Entire grid
  # or Random sweep

  Algorithm: Two-Class Decision Forest
  Parameter ranges:
    - Number of trees: [50, 100, 200]
    - Maximum depth: [5, 10, 15, 20]
    - Minimum samples per leaf: [1, 5, 10]

  Metric: Accuracy
  # Other options: AUC, Precision, Recall, F1

Connect:
  - Training data
  - Validation data (optional)

Creating Real-Time Endpoint

Step 1: Create Inference Pipeline

  1. After training pipeline completes
  2. Click “Create inference pipeline” > “Real-time inference pipeline”
  3. Designer auto-generates the inference pipeline

Step 2: Modify Inference Pipeline

Auto-generated pipeline modifications:
  - Removes training modules
  - Adds Web Service Input
  - Adds Web Service Output
  - Keeps preprocessing and scoring modules

Manual adjustments:
  - Remove unnecessary columns from output
  - Add data validation
  - Format output for consumers

Step 3: Deploy

  1. Submit the inference pipeline
  2. After completion, click “Deploy”
  3. Configure endpoint:
Endpoint configuration:
  Name: customer-churn-endpoint
  Description: Real-time churn prediction
  Compute type: Azure Kubernetes Service
  Compute target: Select existing AKS cluster

  Authentication:
    Type: Key-based
    # or Token-based

  Advanced settings:
    Enable Application Insights: Yes
    CPU cores: 0.1
    Memory GB: 0.5

Step 4: Test Endpoint

import requests
import json

# Endpoint URL and key from Azure ML Studio
scoring_uri = "https://endpoint-url.azureml.net/score"
api_key = "your-api-key"

# Test data
data = {
    "Inputs": {
        "data": [
            {
                "age": 35,
                "workclass": "Private",
                "education": "Bachelors",
                "occupation": "Tech-support",
                "hours_per_week": 40
            }
        ]
    }
}

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

response = requests.post(scoring_uri, json=data, headers=headers)
print(response.json())

Batch Inference Pipeline

Create Batch Pipeline

Pipeline structure:
  [Dataset (or Data Input)] ─── Batch data source
           │
           ▼
  [Select Columns]
           │
           ▼
  [Apply Transformation] ◄── Trained preprocessing
           │
           ▼
  [Score Model] ◄── Trained model
           │
           ▼
  [Export Data] ─── Output to blob storage

Schedule Batch Pipeline

from azureml.core import Workspace, Pipeline
from azureml.pipeline.core import Schedule, ScheduleRecurrence

ws = Workspace.from_config()

# Get published pipeline
pipeline_id = "your-pipeline-id"

# Create recurrence
recurrence = ScheduleRecurrence(
    frequency="Day",
    interval=1,
    start_time="00:00"
)

# Create schedule
schedule = Schedule.create(
    workspace=ws,
    name="daily-batch-scoring",
    pipeline_id=pipeline_id,
    experiment_name="batch-scoring",
    recurrence=recurrence
)

Custom Components

Create Reusable Component

# component.yaml
name: Custom Feature Engineering
display_name: Custom Feature Engineering
version: 1
type: command
inputs:
  input_data:
    type: uri_folder
  config:
    type: string
    default: "default"
outputs:
  output_data:
    type: uri_folder
code: ./src
command: >-
  python feature_engineering.py
  --input-data ${{inputs.input_data}}
  --config ${{inputs.config}}
  --output-data ${{outputs.output_data}}
environment:
  conda_file: ./conda.yaml
  image: mcr.microsoft.com/azureml/base:latest

Register Component

from azure.ai.ml import MLClient, load_component
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id,
    resource_group,
    workspace_name
)

# Load and register component
component = load_component(source="./component.yaml")
registered_component = ml_client.components.create_or_update(component)

print(f"Component: {registered_component.name}")
print(f"Version: {registered_component.version}")

Best Practices

  1. Start with sample pipelines - Learn patterns from examples
  2. Modularize your pipelines - Create reusable components
  3. Version your pipelines - Track changes over time
  4. Test with subset data - Iterate faster during development
  5. Document modules - Add descriptions and comments
  6. Monitor endpoints - Enable Application Insights
  7. Use consistent naming - Clear module and pipeline names
  8. Validate data early - Catch issues before training

When to Use Designer vs Code

Use Designer when:
  - Rapid prototyping
  - Non-programmer team members
  - Simple to medium complexity pipelines
  - Standard preprocessing steps
  - Quick deployment needed

Use Code (SDK) when:
  - Complex custom logic
  - Version control requirements
  - CI/CD integration
  - Advanced debugging needs
  - Team with coding expertise

Conclusion

Azure ML Designer provides a powerful visual interface for building machine learning pipelines. While it’s often positioned as a no-code tool, it supports sophisticated workflows through custom Python/R scripts and reusable components. By combining visual development with code modules where needed, you can build production-ready ML solutions rapidly.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.