SQL Engine Reference
Complete reference for the NATIS SQL Engine — syntax, functions, performance tips, and supported SQL dialects.
The NATIS SQL Engine is ANSI SQL-compliant with extensions for Delta Lake operations, time-travel queries, and ML inference. It supports 700+ built-in functions and is compatible with HiveQL, SparkSQL, and standard ANSI SQL syntax.
Running SQL Queries
-- Basic aggregation
SELECT
region,
product_category,
DATE_TRUNC('month', sale_date) AS month,
SUM(revenue) AS total_revenue,
COUNT(DISTINCT customer_id) AS unique_customers
FROM catalog.gold.sales_transactions
WHERE sale_date >= '2024-01-01'
GROUP BY 1, 2, 3
ORDER BY month DESC, total_revenue DESC;
-- Time travel query (query data as of a specific date)
SELECT * FROM catalog.silver.customers
TIMESTAMP AS OF '2024-06-01 00:00:00'
WHERE country = 'VN';
-- Window function example
SELECT
customer_id,
order_date,
order_value,
SUM(order_value) OVER (
PARTITION BY customer_id
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_spend
FROM catalog.silver.orders;
Delta Lake SQL Extensions
Command | Description | Example — | — | — MERGE INTO | Upsert: insert or update based on condition | MERGE INTO target USING source ON key WHEN MATCHED THEN UPDATE SET ... DESCRIBE HISTORY | View table version history | DESCRIBE HISTORY catalog.silver.orders RESTORE TABLE | Restore table to a previous version | RESTORE TABLE orders TO VERSION AS OF 5 OPTIMIZE | Compact small files for better read performance | OPTIMIZE catalog.silver.orders WHERE date = '2024-06-01' VACUUM | Remove old files beyond retention period | VACUUM catalog.silver.orders RETAIN 168 HOURS ZORDER BY | Co-locate data for faster filtered reads | OPTIMIZE orders ZORDER BY (customer_id, order_date)
Performance Best Practices
The Query Profiler (accessible via the ⓘ icon in the SQL Workspace) shows the execution plan, stage durations, data scanned, and cache hit rates for any query.
- Use ZORDER BY on columns frequently used in WHERE clauses for 10-100x faster reads
- Run OPTIMIZE weekly on high-write tables to consolidate small files
- Partition large tables by a low-cardinality column (e.g., date, region, country)
- Use approximate functions like APPROX_COUNT_DISTINCT for large-scale analytics
- Cache frequently accessed DataFrames in Spark notebooks with .cache() or .persist()
- Use BROADCAST hints for small lookup tables in JOIN operations
Was this page helpful?
Thanks for your feedback!