Back to Blog
3 min read

AI and Machine Learning Trends: What 2021 Taught Us

2021 was a pivotal year for AI and machine learning. Foundation models captured headlines, MLOps matured, and responsible AI became a board-level concern. Let’s examine the trends that defined the year.

Foundation Models and Transfer Learning

Large pre-trained models became the starting point for most ML projects:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from transformers import Trainer, TrainingArguments

# Fine-tuning a pre-trained model - the 2021 way
model_name = "microsoft/deberta-v3-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=3
)

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    evaluation_strategy="epoch"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset
)

trainer.train()

MLOps Became Standard Practice

Production ML requires more than just model training:

# Azure ML Pipeline - the production pattern
$schema: https://azuremlschemas.azureedge.net/latest/pipelineJob.schema.json
type: pipeline
display_name: customer_churn_training

settings:
  default_compute: azureml:gpu-cluster

jobs:
  data_prep:
    type: command
    component: azureml:data_prep@latest
    inputs:
      raw_data:
        type: uri_folder
        path: azureml:customer_data@latest
    outputs:
      prepared_data:
        type: uri_folder

  feature_engineering:
    type: command
    component: azureml:feature_engineering@latest
    inputs:
      input_data: ${{parent.jobs.data_prep.outputs.prepared_data}}
    outputs:
      features:
        type: uri_folder

  train_model:
    type: command
    component: azureml:train_xgboost@latest
    inputs:
      training_data: ${{parent.jobs.feature_engineering.outputs.features}}
      learning_rate: 0.1
      max_depth: 6
    outputs:
      model:
        type: mlflow_model

  evaluate:
    type: command
    component: azureml:model_evaluation@latest
    inputs:
      model: ${{parent.jobs.train_model.outputs.model}}
      test_data: ${{parent.jobs.feature_engineering.outputs.features}}
    outputs:
      evaluation_results:
        type: uri_folder

Responsible AI Matured

Ethics and fairness moved from research to practice:

from fairlearn.metrics import MetricFrame, selection_rate, demographic_parity_difference
from fairlearn.postprocessing import ThresholdOptimizer
import pandas as pd

# Assess fairness metrics
metric_frame = MetricFrame(
    metrics={
        "selection_rate": selection_rate,
        "accuracy": accuracy_score,
        "precision": precision_score
    },
    y_true=y_test,
    y_pred=y_pred,
    sensitive_features=sensitive_features
)

print("Metrics by group:")
print(metric_frame.by_group)

# Calculate demographic parity difference
dp_diff = demographic_parity_difference(
    y_test,
    y_pred,
    sensitive_features=sensitive_features
)
print(f"Demographic Parity Difference: {dp_diff:.4f}")

# Apply threshold optimization for fairness
postprocessor = ThresholdOptimizer(
    estimator=base_model,
    constraints="demographic_parity",
    prefit=True
)
postprocessor.fit(X_train, y_train, sensitive_features=train_sensitive)
y_pred_fair = postprocessor.predict(X_test, sensitive_features=test_sensitive)

AutoML and Democratization

AutoML became production-ready:

from azure.ai.ml import MLClient, automl
from azure.identity import DefaultAzureCredential

ml_client = MLClient(
    DefaultAzureCredential(),
    subscription_id="your-sub",
    resource_group_name="your-rg",
    workspace_name="your-workspace"
)

# AutoML classification job
classification_job = automl.classification(
    compute="cpu-cluster",
    experiment_name="customer_churn_automl",
    training_data=training_data,
    target_column_name="churned",
    primary_metric="AUC_weighted",
    enable_model_explainability=True,
    enable_early_termination=True,
    n_cross_validations=5
)

classification_job.set_limits(
    timeout_minutes=60,
    trial_timeout_minutes=20,
    max_trials=50,
    max_concurrent_trials=4
)

classification_job.set_featurization(mode="auto")

returned_job = ml_client.jobs.create_or_update(classification_job)

Edge AI Gained Momentum

Inference at the edge became practical:

import onnxruntime as ort
import numpy as np

# ONNX Runtime for edge deployment
session_options = ort.SessionOptions()
session_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
session_options.intra_op_num_threads = 4

# Load optimized model
session = ort.InferenceSession(
    "model_quantized.onnx",
    session_options,
    providers=['CPUExecutionProvider']
)

def predict(input_data):
    input_name = session.get_inputs()[0].name
    output_name = session.get_outputs()[0].name

    # Prepare input
    input_array = np.array(input_data, dtype=np.float32)

    # Run inference
    result = session.run([output_name], {input_name: input_array})
    return result[0]

# Model runs efficiently on edge devices
prediction = predict(sensor_readings)

Key Observations from 2021

  1. Fine-tuning over Training from Scratch: Why reinvent the wheel?
  2. MLOps is Table Stakes: Production ML requires proper engineering
  3. Explainability is Required: Black boxes are increasingly unacceptable
  4. Hybrid Deployment: Cloud training, edge inference is the pattern

The Road Ahead

Looking to 2022:

  • Larger foundation models with better efficiency
  • Federated learning for privacy-preserving ML
  • More sophisticated model monitoring
  • AI regulation becoming concrete

2021 proved that AI is no longer experimental - it’s infrastructure. The focus has shifted from “can we do ML?” to “how do we do ML responsibly and reliably?”

Resources

Michael John Pena

Michael John Pena

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.