Back to Blog
1 min read

Reference Data Joins in Stream Analytics

Reference data joins enrich streaming data with static or slowly changing dimension data, enabling contextual analytics.

Reference Data Sources

{
  "type": "Microsoft.StreamAnalytics/streamingjobs/inputs",
  "properties": {
    "type": "Reference",
    "datasource": {
      "type": "Microsoft.Storage/Blob",
      "properties": {
        "storageAccounts": [{
          "accountName": "mystorageaccount",
          "accountKey": "..."
        }],
        "container": "reference-data",
        "pathPattern": "products/{date}/{time}/products.json",
        "dateFormat": "yyyy-MM-dd",
        "timeFormat": "HH-mm"
      }
    }
  }
}

Join Query

SELECT
    stream.transactionId,
    stream.productId,
    stream.quantity,
    stream.timestamp,
    ref.productName,
    ref.category,
    ref.unitPrice,
    stream.quantity * ref.unitPrice AS totalAmount
INTO
    enrichedOutput
FROM
    transactionStream stream
JOIN
    productReference ref
ON
    stream.productId = ref.productId

SQL Database Reference

-- Configure SQL reference
SELECT
    s.sensorId,
    s.reading,
    d.location,
    d.threshold,
    CASE WHEN s.reading > d.threshold THEN 'ALERT' ELSE 'NORMAL' END AS status
FROM sensorStream s
JOIN deviceConfig d
ON s.sensorId = d.sensorId

Refresh Patterns

  • Static: Loaded once at job start
  • Periodic: Refreshes at intervals
  • Dynamic: SQL queries for latest data

Summary

Reference data joins combine streaming data with dimension data for enriched, contextual analytics.


References:

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.