December 3, 2021 1 min read

AI and Machine Learning Trends: What 2021 Taught Us

AI Machine Learning Azure ML Deep Learning MLOps

2021 was a pivotal year for AI and machine learning. Foundation models captured headlines, MLOps matured, and responsible AI became a board-level concern. Let’s examine the trends that defined the year.

Foundation Models and Transfer Learning

Large pre-trained models became the starting point for most ML projects:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers import Trainer, TrainingArguments

# Fine-tuning a pre-trained model - the 2021 way
model_name = "microsoft/deberta-v3-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=3
)

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    evaluation_strategy="epoch"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset
)

trainer.train()

MLOps Became Standard Practice

Production ML requires more than just model training:

# Azure ML Pipeline - the production pattern
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: customer_churn_training

settings:
  default_compute: azureml:gpu-cluster

jobs:
  data_prep:
    type: command
    component: azureml:data_prep@latest
    inputs:
      raw_data:
        type: uri_folder
        path: azureml:customer_data@latest
    outputs:
      prepared_data:
        type: uri_folder

  feature_engineering:
    type: command
    component: azureml:feature_engineering@latest
    inputs:
      input_data: ${{parent.jobs.data_prep.outputs.prepared_data}}
    outputs:
      features:
        type: uri_folder

  train_model:
    type: command
    component: azureml:train_xgboost@latest
    inputs:
      training_data: ${{parent.jobs.feature_engineering.outputs.features}}
      learning_rate: 0.1
      max_depth: 6
    outputs:
      model:
        type: mlflow_model

  evaluate:
    type: command
    component: azureml:model_evaluation@latest
    inputs:
      model: ${{parent.jobs.train_model.outputs.model}}
      test_data: ${{parent.jobs.feature_engineering.outputs.features}}
    outputs:
      evaluation_results:
        type: uri_folder

Responsible AI Matured

Ethics and fairness moved from research to practice:

from fairlearn.metrics import MetricFrame, selection_rate, demographic_parity_difference
from fairlearn.postprocessing import ThresholdOptimizer
import pandas as pd

# Assess fairness metrics
metric_frame = MetricFrame(
    metrics={
        "selection_rate": selection_rate,
        "accuracy": accuracy_score,
        "precision": precision_score
    },
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=sensitive_features
)

print("Metrics by group:")
print(metric_frame.by_group)

# Calculate demographic parity difference
dp_diff = demographic_parity_difference(
    y_test,
    y_pred,
    sensitive_features=sensitive_features
)
print(f"Demographic Parity Difference: {dp_diff:.4f}")

# Apply threshold optimization for fairness
postprocessor = ThresholdOptimizer(
    estimator=base_model,
    constraints="demographic_parity",
    prefit=True
)
postprocessor.fit(X_train, y_train, sensitive_features=train_sensitive)
y_pred_fair = postprocessor.predict(X_test, sensitive_features=test_sensitive)

AutoML and Democratization

AutoML became production-ready:

from azure.ai.ml import MLClient, automl
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="your-sub",
    resource_group_name="your-rg",
    workspace_name="your-workspace"
)

# AutoML classification job
classification_job = automl.classification(
    compute="cpu-cluster",
    experiment_name="customer_churn_automl",
    training_data=training_data,
    target_column_name="churned",
    primary_metric="AUC_weighted",
    enable_model_explainability=True,
    enable_early_termination=True,
    n_cross_validations=5
)

classification_job.set_limits(
    timeout_minutes=60,
    trial_timeout_minutes=20,
    max_trials=50,
    max_concurrent_trials=4
)

classification_job.set_featurization(mode="auto")

returned_job = ml_client.jobs.create_or_update(classification_job)

Edge AI Gained Momentum

Inference at the edge became practical:

import onnxruntime as ort
import numpy as np

# ONNX Runtime for edge deployment
session_options = ort.SessionOptions()
session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
session_options.intra_op_num_threads = 4

# Load optimized model
session = ort.InferenceSession(
    "model_quantized.onnx",
    session_options,
    providers=['CPUExecutionProvider']
)

def predict(input_data):
    input_name = session.get_inputs()[0].name
    output_name = session.get_outputs()[0].name

    # Prepare input
    input_array = np.array(input_data, dtype=np.float32)

    # Run inference
    result = session.run([output_name], {input_name: input_array})
    return result[0]

# Model runs efficiently on edge devices
prediction = predict(sensor_readings)

Key Observations from 2021

Fine-tuning over Training from Scratch: Why reinvent the wheel?
MLOps is Table Stakes: Production ML requires proper engineering
Explainability is Required: Black boxes are increasingly unacceptable
Hybrid Deployment: Cloud training, edge inference is the pattern

The Road Ahead

Looking to 2022:

Larger foundation models with better efficiency
Federated learning for privacy-preserving ML
More sophisticated model monitoring
AI regulation becoming concrete

2021 proved that AI is no longer experimental - it’s infrastructure. The focus has shifted from “can we do ML?” to “how do we do ML responsibly and reliably?”