Data Transformation

Transformation Concepts

Understand the NATIS data transformation model: lakehouse zones, transformation layers, and the pipeline DAG.

5 min read · Updated May 2025

NATIS organizes transformed data across three lakehouse zones: Raw (as-ingested), Silver (cleaned and standardized), and Gold (business-ready aggregates). Transformation pipelines move data progressively through these zones using a directed acyclic graph (DAG) execution model.

Lakehouse Zones

Zone | Also Known As | Data Quality | Typical Use — | — | — | — Raw Zone | Bronze Layer | As-ingested, no changes | Data retention, audit, replay Silver Zone | Refined Layer | Cleaned, deduplicated, typed | Exploration, ad-hoc SQL Gold Zone | Curated Layer | Aggregated, business-logic applied | BI dashboards, ML features

Transformation Methods

All transformations run on NATIS managed clusters. Compute costs are attributed to the pipeline owner's workspace budget. Use the Cost Estimator (available in the Pipeline Editor) before scheduling high-volume jobs.

  • SQL Transformations — drag-and-drop SQL nodes in the Pipeline Editor; supports CTEs, window functions, UDFs
  • Spark Notebooks — PySpark or Scala for complex transformations; full Spark 3.5 API available
  • dbt Integration — deploy and run dbt models natively within NATIS pipelines
  • Low-Code Builder — visual column mapping, type casting, filter, join, and pivot widgets
  • Python Scripts — general-purpose transformation scripts with pandas, polars, or PySpark

Was this page helpful?

Thanks for your feedback!