Back to Blog
2 min read

Medallion Architecture in Microsoft Fabric: Bronze, Silver, Gold Explained

The Medallion architecture organizes data into three layers: Bronze (raw), Silver (cleaned), and Gold (business-ready). This pattern provides clear data lineage, quality improvement at each stage, and separation of concerns.

Bronze Layer: Raw Data Landing

Bronze tables store data exactly as received from source systems, preserving the original format for replayability and debugging.

# Bronze layer ingestion
from pyspark.sql.functions import current_timestamp, input_file_name

def ingest_to_bronze(source_path: str, table_name: str):
    df = spark.read.format("json").load(source_path)

    # Add metadata columns
    df_bronze = df \
        .withColumn("_ingested_at", current_timestamp()) \
        .withColumn("_source_file", input_file_name())

    # Write to Bronze table
    df_bronze.write \
        .format("delta") \
        .mode("append") \
        .option("mergeSchema", "true") \
        .saveAsTable(f"bronze.{table_name}")

ingest_to_bronze("Files/landing/orders/*.json", "raw_orders")

Silver Layer: Cleaned and Conformed

Silver tables apply data quality rules, standardize formats, and enforce schemas.

def transform_to_silver():
    df_bronze = spark.table("bronze.raw_orders")

    df_silver = df_bronze \
        .filter(col("order_id").isNotNull()) \
        .dropDuplicates(["order_id"]) \
        .withColumn("order_date", to_date("order_date_str", "yyyy-MM-dd")) \
        .withColumn("amount", col("amount").cast("decimal(18,2)")) \
        .withColumn("_processed_at", current_timestamp()) \
        .select("order_id", "customer_id", "order_date", "amount", "status",
                "_processed_at", "_ingested_at")

    # Upsert to Silver using MERGE
    silver_table = DeltaTable.forName(spark, "silver.orders")

    silver_table.alias("target").merge(
        df_silver.alias("source"),
        "target.order_id = source.order_id"
    ).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

Gold Layer: Business Aggregates

Gold tables serve specific analytical use cases with pre-computed metrics.

def build_gold_metrics():
    df = spark.table("silver.orders")

    df_gold = df \
        .groupBy("customer_id", date_trunc("month", "order_date").alias("month")) \
        .agg(
            sum("amount").alias("total_spend"),
            count("order_id").alias("order_count"),
            avg("amount").alias("avg_order_value")
        )

    df_gold.write.format("delta").mode("overwrite").saveAsTable("gold.customer_monthly_metrics")

The Medallion architecture scales from small projects to enterprise data platforms, providing consistent patterns your team can follow.

Michael John Peña

Michael John Peña

Senior Data Engineer based in Sydney. Writing about data, cloud, and technology.