Batch Data Connectors
Configure batch ingestion jobs to load data from databases, file systems, and cloud storage.
On this page
Configure batch ingestion jobs to load data from databases, file systems, and cloud storage.
Creating a Batch Ingestion Job
- 1. Navigate to Pipelines → Ingestion → New Ingestion Job.
- 2. Select source type (e.g., PostgreSQL) and click Configure Connection.
- 3. Enter connection details: host, port, database, username, and password. Click Test Connection.
- 4. Select the tables or queries to ingest. Use the schema browser to preview data.
- 5. Configure the load strategy: Full Load, Incremental (by timestamp column), or Partition-based.
- 6. Set the destination: select a catalog, schema, and target table name in the Raw Zone.
- 7. Configure the schedule (cron expression or preset frequency) and click Save & Activate.
Connection String Formats
# PostgreSQL
source:
type: postgresql
host: db.example.com
port: 5432
database: production
username: natis_reader
password: "{{ secret:db_password }}" # stored in NATIS Secrets Manager
ssl: require
# MySQL
source:
type: mysql
host: db.example.com
port: 3306
database: orders
username: natis_reader
password: "{{ secret:mysql_password }}"
# S3 (File Source)
source:
type: s3
bucket: my-data-bucket
prefix: /raw/sales/
file_format: parquet
region: ap-southeast-1
credentials:
type: iam_role
role_arn: arn:aws:iam::123456789:role/natis-s3-reader
Incremental Load Configuration
For incremental loads, NATIS tracks the high-water mark of your chosen incremental key (usually a timestamp or auto-increment ID). On each run, only records with a value greater than the last high-water mark are loaded.
Using Upsert mode requires that the source table has a primary key or unique identifier. Without this, NATIS defaults to Append mode and may produce duplicate records.
- Incremental Key — Column used to detect new/updated records (e.g., updated_at)
- Load Type — Append Only or Upsert (requires primary key)
- Lookback Window — Additional time range to catch late-arriving records (default: 1 hour)
- Parallelism — Number of parallel threads for extraction (default: auto)
Was this page helpful?
Thanks for your feedback!