SQL Iteration for Large Dataset Transformations

When working with large datasets for transformations, Iteration can be used to break the process into smaller chunks, enabling more efficient processing. The SQL Transformer also supports multi-threading, allowing different threads to process separate iterations simultaneously, significantly speeding up transformations. To enable iteration, check the tick box at the bottom of the query window.

This will display the iteration settings window, which includes two fields: SQL Limit and SQL Offset.

  • SQL Limit specifies the number of rows each iteration will contain.

  • SQL Offset determines the number of rows from the start of the dataset that will be skipped in the transformation.

For example, if a table contains 1,000,000 rows, the data can be split into chunks of 100,000 rows, with the first 200,000 rows ignored. In this case, the transformation will be divided into eight iterations of 100,000 rows each, resulting in eight separate RDF files.

Note: Iteration cannot be used with queries that already include LIMIT or OFFSET clauses. These keywords must be removed before enabling iteration.

Last updated