Data Transformation

SQL Engine Reference

Complete reference for the NATIS SQL Engine — syntax, functions, performance tips, and supported SQL dialects.

9 min read · Updated May 2025

On this page

Running SQL Queries
Delta Lake SQL Extensions
Performance Best Practices

The NATIS SQL Engine is ANSI SQL-compliant with extensions for Delta Lake operations, time-travel queries, and ML inference. It supports 700+ built-in functions and is compatible with HiveQL, SparkSQL, and standard ANSI SQL syntax.

Running SQL Queries

SQL

-- Basic aggregation
SELECT 
  region,
  product_category,
  DATE_TRUNC('month', sale_date) AS month,
  SUM(revenue) AS total_revenue,
  COUNT(DISTINCT customer_id) AS unique_customers
FROM catalog.gold.sales_transactions
WHERE sale_date >= '2024-01-01'
GROUP BY 1, 2, 3
ORDER BY month DESC, total_revenue DESC;

-- Time travel query (query data as of a specific date)
SELECT * FROM catalog.silver.customers
TIMESTAMP AS OF '2024-06-01 00:00:00'
WHERE country = 'VN';

-- Window function example
SELECT
  customer_id,
  order_date,
  order_value,
  SUM(order_value) OVER (
    PARTITION BY customer_id 
    ORDER BY order_date
    ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
  ) AS cumulative_spend
FROM catalog.silver.orders;

Delta Lake SQL Extensions

Performance Best Practices

The Query Profiler (accessible via the ⓘ icon in the SQL Workspace) shows the execution plan, stage durations, data scanned, and cache hit rates for any query.

Use ZORDER BY on columns frequently used in WHERE clauses for 10-100x faster reads
Run OPTIMIZE weekly on high-write tables to consolidate small files
Partition large tables by a low-cardinality column (e.g., date, region, country)
Use approximate functions like APPROX_COUNT_DISTINCT for large-scale analytics
Cache frequently accessed DataFrames in Spark notebooks with .cache() or .persist()
Use BROADCAST hints for small lookup tables in JOIN operations

Was this page helpful?

Thanks for your feedback!