2 min read
Medallion Architecture in Microsoft Fabric: Bronze, Silver, Gold Explained
The Medallion architecture organizes data into three layers: Bronze (raw), Silver (cleaned), and Gold (business-ready). This pattern provides clear data lineage, quality improvement at each stage, and separation of concerns.
Bronze Layer: Raw Data Landing
Bronze tables store data exactly as received from source systems, preserving the original format for replayability and debugging.
# Bronze layer ingestion
from pyspark.sql.functions import current_timestamp, input_file_name
def ingest_to_bronze(source_path: str, table_name: str):
df = spark.read.format("json").load(source_path)
# Add metadata columns
df_bronze = df \
.withColumn("_ingested_at", current_timestamp()) \
.withColumn("_source_file", input_file_name())
# Write to Bronze table
df_bronze.write \
.format("delta") \
.mode("append") \
.option("mergeSchema", "true") \
.saveAsTable(f"bronze.{table_name}")
ingest_to_bronze("Files/landing/orders/*.json", "raw_orders")
Silver Layer: Cleaned and Conformed
Silver tables apply data quality rules, standardize formats, and enforce schemas.
def transform_to_silver():
df_bronze = spark.table("bronze.raw_orders")
df_silver = df_bronze \
.filter(col("order_id").isNotNull()) \
.dropDuplicates(["order_id"]) \
.withColumn("order_date", to_date("order_date_str", "yyyy-MM-dd")) \
.withColumn("amount", col("amount").cast("decimal(18,2)")) \
.withColumn("_processed_at", current_timestamp()) \
.select("order_id", "customer_id", "order_date", "amount", "status",
"_processed_at", "_ingested_at")
# Upsert to Silver using MERGE
silver_table = DeltaTable.forName(spark, "silver.orders")
silver_table.alias("target").merge(
df_silver.alias("source"),
"target.order_id = source.order_id"
).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()
Gold Layer: Business Aggregates
Gold tables serve specific analytical use cases with pre-computed metrics.
def build_gold_metrics():
df = spark.table("silver.orders")
df_gold = df \
.groupBy("customer_id", date_trunc("month", "order_date").alias("month")) \
.agg(
sum("amount").alias("total_spend"),
count("order_id").alias("order_count"),
avg("amount").alias("avg_order_value")
)
df_gold.write.format("delta").mode("overwrite").saveAsTable("gold.customer_monthly_metrics")
The Medallion architecture scales from small projects to enterprise data platforms, providing consistent patterns your team can follow.