All Configuration Options
Below is a table containing all of the configurable options within the SQL Transformer. These can be set when configuring your stack or on an already running Transformer.
To do this, simply call the endpoint /updateConfig?configEntry=<entry>&configValue=<value> where the entry the config item as seen below, and value is the new value you wish to set. Any configuration changed while the Transformer is running can also be backed up and restored.
Transformer Configuration
Environment Variable
Entry
Default Value
Description
FRIENDLY_NAME
friendlyName
SQL-Transformer
The name you wish to set your Transformer up with.
TRANSFORMER_LICENSE
transformerLicense
The License key provided required for running the Transformer. Only required when running a non AWS Marketplace version of the Transformer
CRON_EXPRESSION
cronExpression
Switched off by default
The Quartz Cron Expression. Used by the Transformer to set up a time-based job scheduler which will schedule the Transformer to ingest your specified data from your database(s) periodically at fixed times, dates, or intervals. For example, this cron expression value 0 */30 * ? * * * translates to triggering the Transformer every 30 minutes starting at :00 or :30 minutes after the hour.
SQL_LIMIT
sqlLimit
0
The SQL Limit provides the maximum limit to the number of records that can be processed in any one query. This means that if your database contains more records that this set variable, the Transformer will batch process the records from the query and output multiple RDF files. This value must be an integer greater than zero. It defaults to zero, meaning that iterative queries are switched off.
SQL_OFFSET
sqlOffset
0
The SQL Offset provides the ability to offset the start index of the iterative processing.
CONCURRENT_THREADS
concurrentThreads
4
When the Transformer is executing an iterative query, these iterations will be ran in parallel. This option allows you to select how many threads will be used concurrent iteration executions.
CONTINUE_ON_ERROR
continueOnError
false
If there is an error during an iteration, this determines whether the execution will continue or halt.
TRANSFORMER_DIRECTORY
transformerDirectory
file:///var/local/
This is the directory where all Transformer files are stored (assuming individual file dir config haven’t been edited). On Transformer startup, if this has been declared, it will create folders at the specified location for mapping, output, yaml-mapping, prov output, and config backup.
MAPPINGS_DIR_URL
mappingsDirUrl
file:///var/local/mapping/
The URL of the directory containing the mapping file(s). Can be local or remote
OUTPUT_DIR_URL
outputDirUrl
file:///var/local/output/
The URL of the directory you wish the generated RDF to be output to. Can be local or remote
OUTPUT_FILE_FORMAT
outputFileFormat
nquads
The file type that will be constructed when the RDF is created. The options are: nquads, ntriples, jsonld, turtle, trig, and trix.
CONFIG_BACKUP
configBackup
file:///var/local/config-backup/
The URL directory where the config will be backed up to when calling the upload config endpoint
TRANSFORMER_RUN_STANDALONE
runStandalone
true
Each of the Transformers are designed to run as part of a larger end to end system with the end result being the data is uploaded to a Knowledge or Property Graph. As part of this process, Kafka is used to communicate between services. This is enabled by default, however if you want to run the Transformer as standalone without communicating to other services, set this property to true.
CUSTOM_FUNCTION_JAR_URL
customFunctionJarUrl
If you require a function to be executed that doesn’t perform the required operation using the built-in functions, it is possible to create and use your own. To do this, set this variable to the URL of your jar file containing the functions (S3 is also supported), and follow the instructions laid out in this guide.
CUSTOM_FUNCTION_TTL_URL
customFunctionTtlUrl
If you require a function to be executed that doesn’t perform the required operation using the built-in functions, it is possible to create and use your own. To do this, set this variable to the URL of your ttl file containing the mappings to your functions (S3 is also supported), and follow the instructions laid out in this guide.
AWS Configuration
One of the methods for connecting to AWS and S3 on a locally running Transformer is by using environment variables to set the region, access key, and secret key. This must be done when running your docker container. Alternate methods can be found in the AWS documentation.
Environment Variable
Description
AWS_REGION
The region in AWS where your S3 buckets and files reside, for example “us-east-1”.
AWS_ACCESS_KEY_ID
Your access key for AWS.
AWS_SECRET_ACCESS_KEY
Your secret key for AWS.
Property Graph Configuration
Environment Variable
Entry
Default Value
Description
PROPERTY_GRAPH_MODE
propertyGraphMode
false
If you are using a Property Graph as your target graph type, set this configuration to true, otherwise leave as false. When set to true, the Transformer will output Nodes and Edges CSV files instead of RDF files. Please ensure you have constructed the correct mapping file to support property graphs.
PG_GRAPH
pgGraph
default
Set this property to the Property Graph provider you wish to use: neptune, tigergraph or neo4j. By default, nodes and edges files will be created, however the previously mentioned graphs require specific file shapes.
Kafka Configuration
Environment Variable
Entry
Default Value
Description
KAFKA_BROKERS
kafkaBrokers
localhost:9092
The Kafka Broker is what tells the Transformer where to look for your Kafka Cluster. Set with the following structure <kafka-ip>:<kafka-port>. The recommended port is 9092.
KAFKA_TOPIC_NAME_SOURCE
topicNameSource
source_urls
The topic used for the Consumer to read messages order to ingest data. Any message can trigger the Transformer including an empty message.
KAFKA_TOPIC_NAME_DLQ
topicNameDLQ
dead_letter_queue
The topic used to push messages containing reasons for failure within the Transformer. These messages are represented as a JSON.
KAFKA_TOPIC_NAME_SUCCESS
topicNameSuccess
success_queue
The topic used for the messages sent containing the file URLs of the successfully transformed RDF data files.
KAFKA_GROUP_ID_CONFIG
groupIdConfig
consumerGroup1
The identifier of the group this consumer belongs to.
KAFKA_AUTO_OFFSET_RESET_CONFIG
autoOffsetResetConfig
earliest
What to do when there is no initial offset in Kafka or if an offset is out of range.
earliest: automatically reset the offset to the earliest offset
latest: automatically reset the offset to the latest offset
KAFKA_MAX_POLL_RECORDS
maxPollRecords
100
The maximum number of records returned in a single call to poll.
KAFKA_TIMEOUT
timeout
1000
Kafka consumer polling time out.
Provenance Configuration
Environment Variable
Entry
Default Value
Description
RECORD_PROVO
recordProvo
true
Parameter indicating if the provenance meta-data should be generated.
PROV_OUTPUT_DIR_URL
provOutputDirUrl
file:///var/local/prov-output/
The URL of the directory for the provenance meta-data.
PROV_KAFKA_BROKERS
provKafkaBrokers
localhost:9092
This is the location of your Kafka Cluster for provenance. This can be the same or different as your broker for the Transformer.
PROV_KAFKA_TOPIC_NAME_DLQ
provTopicNameDLQ
prov_dead_letter_queue
The topic used for your dead letter queue provenance messages. This can be the same or different as your DLQ topic for the Transformer
PROV_KAFKA_TOPIC_NAME_SUCCESS
provTopicNameSuccess
prov_success_queue
The topic used for the messages sent containing the file URLs of the successfully generated provenance files. This can be the same or different as your success queue topic for the Transformer
SWITCHED_OFF_ACTIVITIES
switchedOffActivities
A comma-separated list of the provenance processes you which to turn off.
The Transformer contains the following processes: main-execution, queryProcessing, tableProcessing, and singleQueryExecution
Logging Configuration
Environment Variable
Default Value
Description
LOG_LEVEL_TRANSFORMER
INFO
Log level for Transformer loggers - change to DEBUG to see more in depth logs, or to WARN or ERROR to quiet the logging.
Additional Logging Configuration
Environment Variable
Default Value
Description
Environment Variable
Default Value
Description
LOGGING_LEVEL
WARN
Global log level
LOGGING_APPENDERS_CONSOLE_TIMEZONE
UTC
Timezone for console logging
LOGGING_APPENDERS_TXT_FILE_THRESHOLD
ALL
Threashold for text logging
Log Format (not overridable)
%-6level [%d{HH:mm:ss.SSS}] [%t] %logger{5} - %X{code} %msg %n
Pattern for logging messages
Current Log Filename (not overridable)
/var/log/graphbuild/text/current/application_${applicationName}_${timeStamp}.txt.log
Pattern for log file name
LOGGING_APPENDERS_TXT_FILE_ARCHIVE
true
Archive log text files
Archived Log Filename Pattern (not overridable)
/var/log/graphbuild/text/archive/application_${applicationName}_${timeStamp}_to_%d{yyyy-MM-dd}.txt.log
Log file rollover frequency depends on pattern in following property. For example %d{yyyy-MM-ww} declares rollover weekly
LOGGING_APPENDERS_TXT_FILE_ARCHIVED_TXT_FILE_COUNT
7
Max number of archived text files
LOGGING_APPENDERS_TXT_FILE_TIMEZONE
UTC
Timezone for text file logging
LOGGING_APPENDERS_JSON_FILE_THRESHOLD
ALL
Threashold for text logging
Log Format (not overridable)
%-6level [%d{HH:mm:ss.SSS}] [%t] %logger{5} - %X{code} %msg %n
Pattern for logging messages
Current Log Filename (not overridable)
/var/log/graphbuild/json/current/application_${applicationName}_${timeStamp}.json.log
Pattern for log file name
LOGGING_APPENDERS_JSON_FILE_ARCHIVE
true
Archive log text files
Archived Log Filename Pattern (not overridable)
/var/log/graphbuild/json/archive/application_${applicationName}_${timeStamp}_to_%d{yyyy-MM-dd}.json.log
Log file rollover frequency depends on pattern in following property. For example %d{yyyy-MM-ww} declares rollover weekly
LOGGING_APPENDERS_JSON_FILE_ARCHIVED_FILE_COUNT
7
Max number of archived text files
LOGGING_APPENDERS_JSON_FILE_TIMEZONE
UTC
Timezone for text file logging
Last updated

