Back to Blog
5 min read

Windows AI Evolution: NPU, Recall, and On-Device Intelligence

Windows is becoming an AI-native operating system. With NPUs (Neural Processing Units) in new PCs and features like Windows Recall, AI capabilities are moving from the cloud to the edge. Let’s explore what this means for developers and data professionals.

The Windows AI Stack

Applications


┌────────────────────────────────────────┐
│  Windows Copilot + AI APIs             │
├────────────────────────────────────────┤
│  DirectML + ONNX Runtime               │
├────────────────────────────────────────┤
│  Windows ML                            │
├────────────────────────────────────────┤
│  Hardware: NPU / GPU / CPU             │
└────────────────────────────────────────┘

Neural Processing Units (NPUs)

NPUs provide dedicated AI acceleration:

import onnxruntime as ort
import numpy as np

# Check available execution providers
providers = ort.get_available_providers()
print(f"Available: {providers}")
# ['DmlExecutionProvider', 'CPUExecutionProvider']  # DML = DirectML (NPU/GPU)

# Load model with NPU acceleration
session_options = ort.SessionOptions()
session = ort.InferenceSession(
    "model.onnx",
    providers=['DmlExecutionProvider', 'CPUExecutionProvider']
)

# Run inference on NPU
def run_inference(input_data):
    result = session.run(None, {"input": input_data})
    return result

# NPU provides:
# - Lower latency than cloud
# - No network dependency
# - Better privacy (data stays local)
# - Lower power consumption than GPU

Windows Recall for Developers

Windows Recall captures and indexes your screen activity:

# Hypothetical: Recall API for developers (speculative)

from windows.ai import RecallClient

recall = RecallClient()

# Search your activity history
results = recall.search(
    query="data pipeline architecture diagram",
    time_range="last_week",
    app_filter=["PowerPoint", "Visio", "Browser"]
)

for result in results:
    print(f"Found at: {result.timestamp}")
    print(f"App: {result.application}")
    print(f"Screenshot: {result.screenshot_path}")
    print(f"Extracted text: {result.text}")

# For data professionals:
# - Find that query you ran last Tuesday
# - Recover the dashboard configuration you were viewing
# - Search for specific data patterns you saw

On-Device Language Models

Small language models running locally:

from windows.ai import LocalLLM

# Initialize local model (runs on NPU)
llm = LocalLLM(
    model="phi-3-mini",  # Small but capable
    device="npu"
)

# Local inference - no cloud required
response = llm.generate(
    prompt="Summarize this meeting transcript:",
    context=transcript_text,
    max_tokens=200
)

# Benefits:
# - Works offline
# - Lower latency (~100ms vs ~500ms cloud)
# - No API costs
# - Data never leaves device

# Trade-offs:
# - Smaller model = less capable
# - Limited context window
# - Fixed model (no easy updates)

Hybrid Local + Cloud AI

from windows.ai import HybridAI

ai = HybridAI(
    local_model="phi-3-mini",
    cloud_model="gpt-4o",
    cloud_client=azure_openai_client
)

async def smart_process(text: str, complexity: str = "auto"):
    """Route to appropriate model based on task."""

    if complexity == "auto":
        # Let the system decide
        return await ai.process(text)

    elif complexity == "simple":
        # Use local NPU
        return await ai.local.generate(text)

    else:
        # Use cloud for complex tasks
        return await ai.cloud.generate(text)

# Automatic routing based on:
# - Task complexity
# - Network availability
# - Latency requirements
# - Cost constraints

Windows ML for Data Processing

import winml
import pandas as pd
import numpy as np

# Train a model and deploy locally
class LocalMLPipeline:
    def __init__(self):
        self.model = None

    def train(self, data: pd.DataFrame, target: str):
        """Train model and export to ONNX."""
        from sklearn.ensemble import RandomForestClassifier
        import skl2onnx

        X = data.drop(columns=[target])
        y = data[target]

        clf = RandomForestClassifier()
        clf.fit(X, y)

        # Convert to ONNX for Windows ML
        onnx_model = skl2onnx.convert_sklearn(
            clf,
            initial_types=[('input', skl2onnx.common.data_types.FloatTensorType([None, X.shape[1]]))]
        )

        with open("model.onnx", "wb") as f:
            f.write(onnx_model.SerializeToString())

    def deploy_local(self):
        """Deploy model for NPU inference."""
        self.model = winml.LearningModel.load_from_file("model.onnx")

    def predict(self, features: np.ndarray) -> np.ndarray:
        """Run inference on NPU."""
        binding = winml.LearningModelBinding(self.model)
        binding.bind("input", winml.TensorFloat.create_from_array(features))

        result = self.model.evaluate(binding)
        return result.outputs["output"].get_as_vector_view()

DirectML for Custom AI

import torch
import torch_directml

# Use DirectML backend for PyTorch (NPU/GPU)
device = torch_directml.device()

# Your model runs on NPU
model = MyCustomModel().to(device)

# Training on NPU
for batch in dataloader:
    inputs = batch['input'].to(device)
    targets = batch['target'].to(device)

    outputs = model(inputs)
    loss = criterion(outputs, targets)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

# DirectML supports:
# - Training and inference
# - PyTorch and TensorFlow
# - Works on AMD, Intel, and NVIDIA hardware

Privacy and Security

# On-device AI enhances privacy

from windows.ai import SecureAI

secure_ai = SecureAI(
    model="phi-3-mini",
    data_encryption=True,
    secure_enclave=True  # Hardware security
)

# Sensitive data processing stays local
result = await secure_ai.process(
    data=sensitive_document,
    task="summarize",
    # Data never leaves the device
    allow_cloud=False
)

# Windows AI security features:
# - Data encryption at rest and in transit
# - Secure enclave for model weights
# - No telemetry on processed data
# - Audit logging for compliance

Developer Considerations

When to Use On-Device AI

use_cases = {
    "on_device_preferred": [
        "Real-time text suggestions",
        "Local document search",
        "Privacy-sensitive analysis",
        "Offline capabilities needed",
        "Low-latency requirements"
    ],
    "cloud_preferred": [
        "Complex reasoning tasks",
        "Large context windows needed",
        "Access to latest models",
        "Heavy compute requirements",
        "Multi-modal advanced tasks"
    ],
    "hybrid": [
        "Start local, escalate to cloud",
        "Cloud train, local inference",
        "Batch cloud, real-time local"
    ]
}

Getting Started

  1. Check hardware: Verify NPU availability
  2. Install Windows ML SDK: Enable local AI development
  3. Export models to ONNX: Standard format for Windows ML
  4. Optimize for NPU: Use quantization and optimization tools
  5. Test offline: Ensure graceful degradation

Windows AI brings intelligence to the edge. For data professionals, this means faster, more private analytics on your desktop.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.