Data Transformation
Pipeline Orchestration & Scheduling
Schedule, monitor, and manage pipeline runs with NATIS Workflow Orchestrator.
NATIS Workflow Orchestrator (NWO) is a built-in DAG-based orchestration engine compatible with Apache Airflow 2.x operators. You can define pipelines visually in the Pipeline Editor or as code using the NATIS Pipeline SDK.
Scheduling Options
- Cron Expression — Standard 5-field cron (e.g., 0 2 * * * for daily at 2AM)
- Preset Schedules — Every 15min, Hourly, Daily, Weekly, Monthly
- Event-Based Triggers — Trigger on file arrival, table update, or API webhook
- Dependency Chains — Start after upstream pipeline succeeds
- Manual / Ad-Hoc — Run on demand from UI, CLI, or API
Pipeline YAML Definition
YAML
# pipeline-daily-sales.yaml
name: daily_sales_pipeline
description: Ingest and transform daily sales data
schedule: "0 6 * * *" # Run daily at 6AM UTC
timezone: Asia/Ho_Chi_Minh
tags: [sales, daily, production]
tasks:
- id: ingest_raw_sales
type: ingestion
source: postgresql://prod-db/sales
destination: catalog.raw.sales
incremental_key: updated_at
on_failure: retry(3, delay=5m)
- id: transform_silver
type: sql
depends_on: [ingest_raw_sales]
query: |
INSERT INTO catalog.silver.sales_cleaned
SELECT * FROM catalog.raw.sales
WHERE amount > 0 AND currency IS NOT NULL
- id: build_gold_aggregates
type: notebook
depends_on: [transform_silver]
notebook_path: /notebooks/sales/daily-aggregates
cluster_size: medium
- id: refresh_dashboard
type: dashboard_refresh
depends_on: [build_gold_aggregates]
dashboard_id: dash_abc123
notifications:
on_failure:
- email: data-team@company.com
- slack: "#data-alerts"
Monitoring Pipeline Runs
View all pipeline runs under Pipelines → Run History. Each run shows: execution status, start/end time, duration, rows processed, compute cost, and a Gantt chart of task execution. Click any failed task to see the full error log and stack trace.
Was this page helpful?
Thanks for your feedback!