December 8, 2022 1 min read

Model Cards: Documenting AI for Transparency and Trust

AI Model Cards Transparency Documentation Best Practices

As AI systems become more prevalent, documenting them becomes crucial. Model cards provide a standardized way to communicate what an AI model does, how it performs, and its limitations. Let’s explore how to create effective model cards.

What is a Model Card?

A model card is a documentation framework for machine learning models, inspired by nutrition labels on food. It provides essential information about a model’s:

Intended use and users
Performance metrics
Limitations and biases
Ethical considerations

Model Card Structure

Basic Template

# Model Card: [Model Name]

## Model Details

**Name:** Customer Churn Predictor v2.3
**Type:** Binary Classification
**Framework:** scikit-learn / XGBoost
**Version:** 2.3.0
**Date:** 2022-12-08
**Owner:** Data Science Team (data-science@company.com)

### Description
This model predicts the probability of customer churn within the next 30 days
based on customer behavior and account attributes.

### Architecture
- Algorithm: XGBoost Classifier
- Features: 45 engineered features
- Output: Probability score (0-1) and binary prediction

## Intended Use

### Primary Use Cases
- Identify at-risk customers for proactive retention campaigns
- Prioritize customer success team outreach
- Trigger automated retention workflows

### Primary Users
- Customer Success Team
- Marketing Automation Systems
- Executive Dashboards

### Out-of-Scope Uses
- Credit decisions
- Insurance underwriting
- Employment decisions
- Any automated decision without human review

## Training Data

### Dataset Description
- **Source:** Internal CRM and transaction database
- **Time Period:** January 2020 - October 2022
- **Size:** 500,000 customer records
- **Label Definition:** Customer who cancelled within 30 days of prediction date

### Preprocessing
- Missing values imputed using median (numeric) and mode (categorical)
- Outliers capped at 99th percentile
- Categorical features one-hot encoded

## Performance

### Overall Metrics
| Metric | Value |
|--------|-------|
| Accuracy | 0.847 |
| Precision | 0.723 |
| Recall | 0.689 |
| F1 Score | 0.706 |
| AUC-ROC | 0.891 |

### Performance by Segment

| Segment | Accuracy | AUC-ROC | Sample Size |
|---------|----------|---------|-------------|
| Enterprise | 0.862 | 0.903 | 45,000 |
| SMB | 0.841 | 0.887 | 180,000 |
| Consumer | 0.839 | 0.882 | 275,000 |

| Tenure | Accuracy | AUC-ROC | Sample Size |
|--------|----------|---------|-------------|
| < 6 months | 0.798 | 0.845 | 85,000 |
| 6-24 months | 0.856 | 0.901 | 220,000 |
| > 24 months | 0.871 | 0.912 | 195,000 |

## Ethical Considerations

### Fairness Evaluation
Model evaluated for disparate impact across:
- Customer tier (no significant disparity)
- Geographic region (see limitations)
- Account age (see limitations)

### Known Biases
- Lower accuracy for customers with < 6 months tenure
- May underperform for customers in newly launched regions

### Mitigation Steps
- Separate model being developed for new customers
- Human review required for customers in new regions

## Limitations

### Technical Limitations
- Requires at least 30 days of customer history
- Performance degrades for customers with very low activity
- Does not account for seasonal patterns in churn

### Deployment Limitations
- Predictions should be refreshed weekly minimum
- Not designed for real-time inference
- Requires feature engineering pipeline to be operational

### Known Failure Modes
- Customers who churn due to external factors (e.g., acquisition)
- Customers with sudden behavior changes
- Bulk enterprise cancellations

## Recommendations

### For Users
- Use predictions as one input to retention decisions, not the sole factor
- Review low-confidence predictions manually
- Consider customer feedback alongside model predictions

### For Operators
- Monitor drift in feature distributions monthly
- Retrain model quarterly or when AUC drops below 0.85
- Maintain fallback rules for system outages

## Updates and Changelog

| Version | Date | Changes |
|---------|------|---------|
| 2.3.0 | 2022-12-01 | Added 10 new features, retrained on 2022 data |
| 2.2.0 | 2022-06-01 | Fixed label leakage issue |
| 2.1.0 | 2022-03-01 | Initial production deployment |

## Contact

- **Owner:** Data Science Team
- **Email:** data-science@company.com
- **Issues:** https://github.com/company/churn-model/issues

Generating Model Cards Programmatically

from dataclasses import dataclass, field
from typing import List, Dict, Optional
from datetime import date
import json

@dataclass
class PerformanceMetrics:
    accuracy: float
    precision: float
    recall: float
    f1_score: float
    auc_roc: float

@dataclass
class SegmentPerformance:
    segment_name: str
    segment_value: str
    metrics: PerformanceMetrics
    sample_size: int

@dataclass
class ModelCard:
    # Model Details
    name: str
    version: str
    model_type: str
    framework: str
    owner: str
    description: str
    date_created: date = field(default_factory=date.today)

    # Intended Use
    primary_uses: List[str] = field(default_factory=list)
    primary_users: List[str] = field(default_factory=list)
    out_of_scope_uses: List[str] = field(default_factory=list)

    # Training Data
    data_source: str = ""
    data_timeframe: str = ""
    data_size: int = 0
    preprocessing_steps: List[str] = field(default_factory=list)

    # Performance
    overall_metrics: Optional[PerformanceMetrics] = None
    segment_performance: List[SegmentPerformance] = field(default_factory=list)

    # Ethics
    fairness_evaluation: str = ""
    known_biases: List[str] = field(default_factory=list)
    mitigation_steps: List[str] = field(default_factory=list)

    # Limitations
    technical_limitations: List[str] = field(default_factory=list)
    deployment_limitations: List[str] = field(default_factory=list)
    failure_modes: List[str] = field(default_factory=list)

    # Recommendations
    user_recommendations: List[str] = field(default_factory=list)
    operator_recommendations: List[str] = field(default_factory=list)

    def to_markdown(self) -> str:
        """Generate markdown model card."""
        md = f"""# Model Card: {self.name}

## Model Details

**Name:** {self.name}
**Version:** {self.version}
**Type:** {self.model_type}
**Framework:** {self.framework}
**Date:** {self.date_created}
**Owner:** {self.owner}

### Description
{self.description}

## Intended Use

### Primary Use Cases
{self._list_to_md(self.primary_uses)}

### Primary Users
{self._list_to_md(self.primary_users)}

### Out-of-Scope Uses
{self._list_to_md(self.out_of_scope_uses)}

## Training Data

- **Source:** {self.data_source}
- **Time Period:** {self.data_timeframe}
- **Size:** {self.data_size:,} records

### Preprocessing
{self._list_to_md(self.preprocessing_steps)}

## Performance

### Overall Metrics
{self._metrics_table(self.overall_metrics)}

### Performance by Segment
{self._segment_table(self.segment_performance)}

## Ethical Considerations

### Fairness Evaluation
{self.fairness_evaluation}

### Known Biases
{self._list_to_md(self.known_biases)}

### Mitigation Steps
{self._list_to_md(self.mitigation_steps)}

## Limitations

### Technical Limitations
{self._list_to_md(self.technical_limitations)}

### Deployment Limitations
{self._list_to_md(self.deployment_limitations)}

### Known Failure Modes
{self._list_to_md(self.failure_modes)}

## Recommendations

### For Users
{self._list_to_md(self.user_recommendations)}

### For Operators
{self._list_to_md(self.operator_recommendations)}
"""
        return md

    def _list_to_md(self, items: List[str]) -> str:
        return "\n".join(f"- {item}" for item in items)

    def _metrics_table(self, metrics: PerformanceMetrics) -> str:
        if not metrics:
            return "No metrics available"
        return f"""| Metric | Value |
|--------|-------|
| Accuracy | {metrics.accuracy:.3f} |
| Precision | {metrics.precision:.3f} |
| Recall | {metrics.recall:.3f} |
| F1 Score | {metrics.f1_score:.3f} |
| AUC-ROC | {metrics.auc_roc:.3f} |"""

    def _segment_table(self, segments: List[SegmentPerformance]) -> str:
        if not segments:
            return "No segment analysis available"

        rows = ["| Segment | Value | Accuracy | AUC-ROC | Sample Size |",
                "|---------|-------|----------|---------|-------------|"]

        for seg in segments:
            rows.append(f"| {seg.segment_name} | {seg.segment_value} | "
                       f"{seg.metrics.accuracy:.3f} | {seg.metrics.auc_roc:.3f} | "
                       f"{seg.sample_size:,} |")

        return "\n".join(rows)

    def to_json(self) -> str:
        """Export model card as JSON."""
        return json.dumps(self.__dict__, default=str, indent=2)


# Usage example
def create_model_card_from_training(model, X_test, y_test, metadata: dict) -> ModelCard:
    """Create a model card from trained model and test data."""
    from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

    y_pred = model.predict(X_test)
    y_prob = model.predict_proba(X_test)[:, 1] if hasattr(model, 'predict_proba') else y_pred

    metrics = PerformanceMetrics(
        accuracy=accuracy_score(y_test, y_pred),
        precision=precision_score(y_test, y_pred),
        recall=recall_score(y_test, y_pred),
        f1_score=f1_score(y_test, y_pred),
        auc_roc=roc_auc_score(y_test, y_prob)
    )

    card = ModelCard(
        name=metadata['name'],
        version=metadata['version'],
        model_type=metadata['type'],
        framework=type(model).__name__,
        owner=metadata['owner'],
        description=metadata['description'],
        overall_metrics=metrics,
        **metadata.get('additional_info', {})
    )

    return card

Integrating Model Cards with Azure ML

from azure.ai.ml import MLClient
from azure.ai.ml.entities import Model

def register_model_with_card(
    ml_client: MLClient,
    model_path: str,
    model_card: ModelCard
) -> Model:
    """Register model in Azure ML with model card as metadata."""

    model = Model(
        path=model_path,
        name=model_card.name,
        version=model_card.version,
        description=model_card.description,
        tags={
            "model_type": model_card.model_type,
            "framework": model_card.framework,
            "owner": model_card.owner,
            "auc_roc": str(model_card.overall_metrics.auc_roc)
        },
        properties={
            "model_card_json": model_card.to_json()
        }
    )

    registered = ml_client.models.create_or_update(model)

    # Also save model card as artifact
    with open(f"{model_path}/model_card.md", "w") as f:
        f.write(model_card.to_markdown())

    return registered

Conclusion

Model cards are essential for responsible AI deployment. They provide transparency, enable informed decisions about model use, and facilitate regulatory compliance. Make model card creation part of your standard ML workflow - your future self and your users will thank you.