April 23, 2021 2 min read

Azure ML Designer - Visual Machine Learning Pipeline Development

Azure Machine Learning Azure ML Designer Visual Development No-Code ML

Azure Machine Learning Designer provides a drag-and-drop interface for building machine learning pipelines. It bridges the gap between no-code accessibility and enterprise-grade ML capabilities. Today, I want to show you how to leverage Designer for rapid prototyping and production workflows.

Understanding Azure ML Designer

Designer offers:

Visual pipeline building - Drag-and-drop components
Pre-built modules - Data processing, training, evaluation
Custom components - Extend with your own code
Real-time endpoints - One-click deployment
Batch endpoints - Scheduled scoring pipelines

Getting Started

Access Designer

Navigate to Azure ML Studio
Click “Designer” in the left menu
Create a new pipeline or use a sample

Key Concepts

Pipeline = Collection of connected modules
Module = Self-contained processing step
Dataset = Input data for the pipeline
Compute = Resources for running the pipeline

Building a Classification Pipeline

Step 1: Add Dataset

Drag Dataset module from the left panel
Configure:
- Select registered dataset, OR
- Use sample dataset (e.g., Adult Census Income)

Step 2: Data Preprocessing

Recommended modules:
  - Select Columns in Dataset:
      Purpose: Choose relevant features
      Configuration: Select by name or by type

  - Clean Missing Data:
      Purpose: Handle null values
      Options:
        - Remove rows
        - Replace with mean/median/mode
        - Replace with custom value

  - Edit Metadata:
      Purpose: Set column types and categories
      Configuration: Mark categorical columns

  - Normalize Data:
      Purpose: Scale numeric features
      Methods:
        - MinMax
        - ZScore
        - Logistic

Step 3: Split Data

Module: Split Data
Configuration:
  Splitting mode: Split Rows
  Fraction of rows: 0.7
  Random seed: 42
  Stratified split: Yes (for classification)

Step 4: Train Model

Modules to connect:
  1. Algorithm module:
     - Two-Class Boosted Decision Tree
     - Two-Class Logistic Regression
     - Two-Class Neural Network

  2. Train Model:
     Connect:
       - Algorithm to left input
       - Training data to right input
     Configuration:
       - Label column: target

Step 5: Score and Evaluate

Score Model:
  Connect:
    - Trained model to left input
    - Test data to right input

Evaluate Model:
  Connect:
    - Scored data to input
  Outputs:
    - Accuracy metrics
    - ROC curve
    - Confusion matrix

Complete Pipeline Example

[Dataset: Customer Data]
        │
        ▼
[Select Columns in Dataset]
        │
        ▼
[Clean Missing Data]
        │
        ▼
[Edit Metadata] ───── Mark categorical columns
        │
        ▼
[Normalize Data]
        │
        ▼
[Split Data] ─────── 70/30 split
    │       │
    │       └─────────────────┐
    ▼                         ▼
[Train Model] ◄── [Two-Class Boosted Decision Tree]
    │
    ▼
[Score Model] ◄───── Test Data
    │
    ▼
[Evaluate Model]

Custom Python Scripts

Execute Python Script Module

# dataframe1: Input dataset 1
# dataframe2: Input dataset 2 (optional)

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder

def azureml_main(dataframe1=None, dataframe2=None):
    """
    Custom preprocessing function
    """
    df = dataframe1.copy()

    # Feature engineering
    df['age_group'] = pd.cut(df['age'],
                             bins=[0, 25, 45, 65, 100],
                             labels=['young', 'middle', 'senior', 'elderly'])

    # Encode categorical variables
    le = LabelEncoder()
    for col in df.select_dtypes(include=['object']).columns:
        df[col] = le.fit_transform(df[col].astype(str))

    # Create interaction features
    df['income_per_hour'] = df['income'] / (df['hours_per_week'] + 1)

    # Drop original columns if needed
    df = df.drop(['redundant_column'], axis=1, errors='ignore')

    return df,  # Return tuple of DataFrames

Execute R Script Module

# dataframe1: Input dataset 1
# dataframe2: Input dataset 2 (optional)

azureml_main <- function(dataframe1, dataframe2) {
    # Load libraries
    library(dplyr)
    library(tidyr)

    # Custom preprocessing
    df <- dataframe1 %>%
        mutate(
            age_squared = age^2,
            log_income = log(income + 1)
        ) %>%
        filter(!is.na(target))

    # Return result
    return(list(df))
}

Hyperparameter Tuning

Using Tune Model Hyperparameters

Module: Tune Model Hyperparameters
Configuration:
  Parameter sweeping mode: Entire grid
  # or Random sweep

  Algorithm: Two-Class Decision Forest
  Parameter ranges:
    - Number of trees: [50, 100, 200]
    - Maximum depth: [5, 10, 15, 20]
    - Minimum samples per leaf: [1, 5, 10]

  Metric: Accuracy
  # Other options: AUC, Precision, Recall, F1

Connect:
  - Training data
  - Validation data (optional)

Creating Real-Time Endpoint

Step 1: Create Inference Pipeline

After training pipeline completes
Click “Create inference pipeline” > “Real-time inference pipeline”
Designer auto-generates the inference pipeline

Step 2: Modify Inference Pipeline

Auto-generated pipeline modifications:
  - Removes training modules
  - Adds Web Service Input
  - Adds Web Service Output
  - Keeps preprocessing and scoring modules

Manual adjustments:
  - Remove unnecessary columns from output
  - Add data validation
  - Format output for consumers

Step 3: Deploy

Submit the inference pipeline
After completion, click “Deploy”
Configure endpoint:

Endpoint configuration:
  Name: customer-churn-endpoint
  Description: Real-time churn prediction
  Compute type: Azure Kubernetes Service
  Compute target: Select existing AKS cluster

  Authentication:
    Type: Key-based
    # or Token-based

  Advanced settings:
    Enable Application Insights: Yes
    CPU cores: 0.1
    Memory GB: 0.5

Step 4: Test Endpoint

import requests
import json

# Endpoint URL and key from Azure ML Studio
scoring_uri = "https://endpoint-url.azureml.net/score"
api_key = "your-api-key"

# Test data
data = {
    "Inputs": {
        "data": [
            {
                "age": 35,
                "workclass": "Private",
                "education": "Bachelors",
                "occupation": "Tech-support",
                "hours_per_week": 40
            }
        ]
    }
}

headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

response = requests.post(scoring_uri, json=data, headers=headers)
print(response.json())

Batch Inference Pipeline

Create Batch Pipeline

Pipeline structure:
  [Dataset (or Data Input)] ─── Batch data source
           │
           ▼
  [Select Columns]
           │
           ▼
  [Apply Transformation] ◄── Trained preprocessing
           │
           ▼
  [Score Model] ◄── Trained model
           │
           ▼
  [Export Data] ─── Output to blob storage

Schedule Batch Pipeline

from azureml.core import Workspace, Pipeline
from azureml.pipeline.core import Schedule, ScheduleRecurrence

ws = Workspace.from_config()

# Get published pipeline
pipeline_id = "your-pipeline-id"

# Create recurrence
recurrence = ScheduleRecurrence(
    frequency="Day",
    interval=1,
    start_time="00:00"
)

# Create schedule
schedule = Schedule.create(
    workspace=ws,
    name="daily-batch-scoring",
    pipeline_id=pipeline_id,
    experiment_name="batch-scoring",
    recurrence=recurrence
)

Custom Components

Create Reusable Component

# component.yaml
name: Custom Feature Engineering
display_name: Custom Feature Engineering
version: 1
type: command
inputs:
  input_data:
    type: uri_folder
  config:
    type: string
    default: "default"
outputs:
  output_data:
    type: uri_folder
code: ./src
command: >-
  python feature_engineering.py
  --input-data ${{inputs.input_data}}
  --config ${{inputs.config}}
  --output-data ${{outputs.output_data}}
environment:
  conda_file: ./conda.yaml
  image: mcr.microsoft.com/azureml/base:latest

Register Component

from azure.ai.ml import MLClient, load_component
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id,
    resource_group,
    workspace_name
)

# Load and register component
component = load_component(source="./component.yaml")
registered_component = ml_client.components.create_or_update(component)

print(f"Component: {registered_component.name}")
print(f"Version: {registered_component.version}")

Best Practices

Start with sample pipelines - Learn patterns from examples
Modularize your pipelines - Create reusable components
Version your pipelines - Track changes over time
Test with subset data - Iterate faster during development
Document modules - Add descriptions and comments
Monitor endpoints - Enable Application Insights
Use consistent naming - Clear module and pipeline names
Validate data early - Catch issues before training

When to Use Designer vs Code

Use Designer when:
  - Rapid prototyping
  - Non-programmer team members
  - Simple to medium complexity pipelines
  - Standard preprocessing steps
  - Quick deployment needed

Use Code (SDK) when:
  - Complex custom logic
  - Version control requirements
  - CI/CD integration
  - Advanced debugging needs
  - Team with coding expertise

Conclusion

Azure ML Designer provides a powerful visual interface for building machine learning pipelines. While it’s often positioned as a no-code tool, it supports sophisticated workflows through custom Python/R scripts and reusable components. By combining visual development with code modules where needed, you can build production-ready ML solutions rapidly.