# All Configuration Options

Below is a table containing all of the configurable options within the Semi-Structured Transformer. These can be set when configuring your stack or on an already running Transformer.&#x20;

To do this, simply call the endpoint `/updateConfig?configEntry=<entry>&configValue=<value>` where the entry the config item as seen below, and value is the new value you wish to set. Any configuration changed while the Transformer is running can also be backed up and restored.

## Transformer Configuration <a href="#transformer-configuration" id="transformer-configuration"></a>

| **Environment Variable**     | **Entry**            | **Default Value**                | **Description**                                                                                                                                                                                                                                                                                                                                                                                             |
| ---------------------------- | -------------------- | -------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| FRIENDLY\_NAME               | friendlyName         | Semi-Structured Transformer      | The name you wish to call your Transformer.                                                                                                                                                                                                                                                                                                                                                                 |
| TRANFORMER\_LICENSE          | transformerLicense   |                                  | The License key provided required for running the Transformer. Only required when running a non AWS Marketplace version of the Transformer.                                                                                                                                                                                                                                                                 |
| TRANSFORMER\_DIRECTORY       | transformerDirectory | file:///var/local/               | This is the directory where all Transformer files are stored (assuming individual file dir config haven’t been edited). On Transformer startup, if this has been declared, it will create folders at the specified location for mapping, output, yaml-mapping, prov output, and config backup.                                                                                                              |
| MAPPINGS\_DIR\_URL           | mappingsDirUrl       | file:///var/local/mapping/       | The URL of the directory containing the mapping file(s). Can be local or remote, see [here](https://data-lens.atlassian.net/wiki/spaces/GraphBuild/pages/1769046358/User+Guide+-+Semi+Structured+Transformer#Directories-in-Transformers) for more details.                                                                                                                                                 |
| OUTPUT\_DIR\_URL             | outputDirUrl         | file:///var/local/output/        | The URL of the directory you wish the generated RDF to be output to. Can be local or remote                                                                                                                                                                                                                                                                                                                 |
| OUTPUT\_FILE\_FORMAT         | outputFileFormat     | nquads                           | The file type that will be constructed when the RDF is created. The options are: `nquads`, `ntriples`, `jsonld`, `turtle`, `trig`, and `trix`.                                                                                                                                                                                                                                                              |
| CONFIG\_BACKUP               | configBackup         | file:///var/local/config-backup/ | The URL directory where the config will be backed up to when calling the [upload config](https://data-lens.atlassian.net/wiki/spaces/GraphBuild/pages/1769046358/User+Guide+-+Semi+Structured+Transformer#RESTful-API-Endpoint) endpoint                                                                                                                                                                    |
| MAX\_CSV\_ROWS               | maxCsvRows           | 100000                           | The maximum number of rows a CSV file can be before it is split into smaller files (of specified length) and processed individually.                                                                                                                                                                                                                                                                        |
| VALIDATE\_CSV                | validateCsv          | true                             | By default, the Transformer will check a CSV file is valid and remove any invalid or multiline rows. To turn this off, set this property to false.                                                                                                                                                                                                                                                          |
| TRANSFORMER\_RUN\_STANDALONE | runStandalone        | true                             | Each of the Transformers are designed to run as part of a larger end to end system with the end result being graph data uploaded to a Knowledge or Property Graph. As part of this process, Kafka is used to communicate between services. This is enabled by default, however if you want to run the Transformer as standalone without communicating to other services, set this property to true.         |
| CUSTOM\_FUNCTION\_JAR\_URL   | customFunctionJarUrl |                                  | If you require a function to be executed that doesn’t perform the required operation using the built-in functions, it is possible to create and use your own. To do this, set this variable to the URL of your jar file containing the functions (S3 is also supported), and follow the instructions laid out in [this guide](/EGeX4aTAJLlpg9Hh8kfl/functions/create-custom-functions.md).                  |
| CUSTOM\_FUNCTION\_TTL\_URL   | customFunctionTtlUrl |                                  | If you require a function to be executed that doesn’t perform the required operation using the built-in functions, it is possible to create and use your own. To do this, set this variable to the URL of your ttl file containing the mappings to your functions (S3 is also supported), and follow the instructions laid out in [this guide](/EGeX4aTAJLlpg9Hh8kfl/functions/create-custom-functions.md). |

&#x20;

## AWS Configuration <a href="#aws-configuration" id="aws-configuration"></a>

One of the methods for connecting to AWS and S3 on a locally running Transformer is by using environment variables to set the region, access key, and secret key. This must be done when running your docker container. Alternate methods can be found in the [AWS documentation](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html).

| **Environment Variable** | **Description**                                                                    |
| ------------------------ | ---------------------------------------------------------------------------------- |
| AWS\_REGION              | The region in AWS where your S3 buckets and files reside, for example “us-east-1”. |
| AWS\_ACCESS\_KEY\_ID     | Your access key for AWS.                                                           |
| AWS\_SECRET\_ACCESS\_KEY | Your secret key for AWS.                                                           |

&#x20;

## Property Graph Configuration <a href="#property-graph-configuration" id="property-graph-configuration"></a>

| **Environment Variable** | **Entry**         | **Default Value** | **Description**                                                                                                                                                                                                                                                                                                                            |
| ------------------------ | ----------------- | ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| PROPERTY\_GRAPH\_MODE    | propertyGraphMode | false             | If you are using a Property Graph as your target graph type, set this configuration to true, otherwise leave as false. When set to true, the Transformer will output Nodes and Edges CSV files instead of RDF files.                                                                                                                       |
| PG\_GRAPH                | pgGraph           | default           | Set this property to the Property Graph provider you wish to use: `neptune-gremlin, neptune-cypher`, `tigergraph` or `neo4j`. By default, nodes and edges files will be created, however the previously mentioned graphs require specific file shapes. Select default if you wish to use the Graph Writer to write to your property graph. |

&#x20;

## Kafka Configuration <a href="#kafka-configuration" id="kafka-configuration"></a>

| **Environment Variable**           | **Entry**             | **Default Value**   | **Description**                                                                                                                                                                                                                                               |
| ---------------------------------- | --------------------- | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| KAFKA\_BROKERS                     | kafkaBrokers          | localhost:9092      | The Kafka Broker is what tells the Transformer where to look for your Kafka Cluster. Set with the following structure `<kafka-ip>:<kafka-port>`. The recommended port is `9092`.                                                                              |
| KAFKA\_TOPIC\_NAME\_SOURCE         | topicNameSource       | source\_urls        | The topic used for the Consumer to read messages from containing input file URLs in order to ingest data.                                                                                                                                                     |
| KAFKA\_TOPIC\_NAME\_DLQ            | topicNameDLQ          | dead\_letter\_queue | The topic used to push messages containing reasons for failure within the Transformer. These messages are represented as JSON.                                                                                                                                |
| KAFKA\_TOPIC\_NAME\_SUCCESS        | topicNameSuccess      | success\_queue      | The topic used for the messages sent containing the file URLs of the successfully transformed graph data files.                                                                                                                                               |
| KAFKA\_GROUP\_ID\_CONFIG           | groupIdConfig         | consumerGroup1      | The identifier of the group this consumer belongs to.                                                                                                                                                                                                         |
| KAFKA\_AUTO\_OFFSET\_RESET\_CONFIG | autoOffsetResetConfig | earliest            | <p>What to do when there is no initial offset in Kafka or if an offset is out of range.</p><p><code>earliest</code>: automatically reset the offset to the earliest offset</p><p><code>latest</code>: automatically reset the offset to the latest offset</p> |
| KAFKA\_MAX\_POLL\_RECORDS          | maxPollRecords        | 100                 | The maximum number of records returned in a single call to poll.                                                                                                                                                                                              |
| KAFKA\_TIMEOUT                     | timeout               | 1000                | Kafka consumer polling time out.                                                                                                                                                                                                                              |

&#x20;

## Provenance Configuration <a href="#provenance-configuration" id="provenance-configuration"></a>

| **Environment Variable**          | **Entry**             | **Default Value**              | **Description**                                                                                                                                                                                                                          |
| --------------------------------- | --------------------- | ------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| RECORD\_PROVO                     | recordProvo           | true                           | Parameter indicating whether any provenance meta-data should be generated                                                                                                                                                                |
| PROV\_OUTPUT\_DIR\_URL            | provOutputDirUrl      | file:///var/local/prov-output/ | The URL of the directory for the provenance meta-data                                                                                                                                                                                    |
| PROV\_KAFKA\_BROKERS              | provKafkaBrokers      | localhost:9092                 | This is the location of your Kafka Cluster for provenance. This can be the same or different as your broker for the Transformer                                                                                                          |
| PROV\_KAFKA\_TOPIC\_NAME\_DLQ     | provTopicNameDLQ      | prov\_dead\_letter\_queue      | The topic used for your dead letter queue provenance messages. This can be the same or different as your DLQ topic for the Transformer                                                                                                   |
| PROV\_KAFKA\_TOPIC\_NAME\_SUCCESS | provTopicNameSuccess  | prov\_success\_queue           | The topic used for the messages sent containing the file URLs of the successfully generated provenance files. This can be the same or different as your success queue topic for the Transformer                                          |
| SWITCHED\_OFF\_ACTIVITIES         | switchedOffActivities |                                | <p>A comma-separated list of the provenance processes you which to turn off.</p><p>The Transformer contains the following processes: <code>main-execution</code>, <code>kafkaActivity</code>, and <code>transformer-iteration</code></p> |

&#x20;

## Logging Configuration <a href="#logging-configuration" id="logging-configuration"></a>

| **Environment Variable** | **Default Value** | **Description**                                                                                                          |
| ------------------------ | ----------------- | ------------------------------------------------------------------------------------------------------------------------ |
| LOG\_LEVEL\_TRANSFORMER  | INFO              | Log level for Transformer loggers - change to DEBUG to see more in depth logs, or to WARN or ERROR to quiet the logging. |

&#x20;

## Additional Logging Configuration <a href="#additional-logging-configuration" id="additional-logging-configuration"></a>

| **Environment Variable** | **Default Value** | **Description** |
| ------------------------ | ----------------- | --------------- |

| **Environment Variable**                                  | **Default Value**                                                                                           | **Description**                                                                                                           |
| --------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
| LOGGING\_LEVEL                                            | WARN                                                                                                        | Global log level                                                                                                          |
| LOGGING\_APPENDERS\_CONSOLE\_TIMEZONE                     | UTC                                                                                                         | Timezone for console logging                                                                                              |
| LOGGING\_APPENDERS\_TXT\_FILE\_THRESHOLD                  | ALL                                                                                                         | Threashold for text logging                                                                                               |
| Log Format (not overridable)                              | %-6level \[%d{HH:mm:ss.SSS}] \[%t] %logger{5} - %X{code} %msg %n                                            | Pattern for logging messages                                                                                              |
| Current Log Filename (not overridable)                    | /var/log/graphbuild/text/current/application\_${applicationName}\_${timeStamp}.txt.log                      | Pattern for log file name                                                                                                 |
| LOGGING\_APPENDERS\_TXT\_FILE\_ARCHIVE                    | true                                                                                                        | Archive log text files                                                                                                    |
| Archived Log Filename Pattern (not overridable)           | /var/log/graphbuild/text/archive/application\_${applicationName}\_${timeStamp}\_to\_%d{yyyy-MM-dd}.txt.log  | Log file rollover frequency depends on pattern in following property. For example %d{yyyy-MM-ww} declares rollover weekly |
| LOGGING\_APPENDERS\_TXT\_FILE\_ARCHIVED\_TXT\_FILE\_COUNT | 7                                                                                                           | Max number of archived text files                                                                                         |
| LOGGING\_APPENDERS\_TXT\_FILE\_TIMEZONE                   | UTC                                                                                                         | Timezone for text file logging                                                                                            |
| LOGGING\_APPENDERS\_JSON\_FILE\_THRESHOLD                 | ALL                                                                                                         | Threashold for text logging                                                                                               |
| Log Format (not overridable)                              | %-6level \[%d{HH:mm:ss.SSS}] \[%t] %logger{5} - %X{code} %msg %n                                            | Pattern for logging messages                                                                                              |
| Current Log Filename (not overridable)                    | /var/log/graphbuild/json/current/application\_${applicationName}\_${timeStamp}.json.log                     | Pattern for log file name                                                                                                 |
| LOGGING\_APPENDERS\_JSON\_FILE\_ARCHIVE                   | true                                                                                                        | Archive log text files                                                                                                    |
| Archived Log Filename Pattern (not overridable)           | /var/log/graphbuild/json/archive/application\_${applicationName}\_${timeStamp}\_to\_%d{yyyy-MM-dd}.json.log | Log file rollover frequency depends on pattern in following property. For example %d{yyyy-MM-ww} declares rollover weekly |
| LOGGING\_APPENDERS\_JSON\_FILE\_ARCHIVED\_FILE\_COUNT     | 7                                                                                                           | Max number of archived text files                                                                                         |
| LOGGING\_APPENDERS\_JSON\_FILE\_TIMEZONE                  | UTC                                                                                                         | Timezone for text file logging                                                                                            |
| LOGGING\_APPENDERS\_JSON\_FILE\_LAYOUT\_TYPE              | json                                                                                                        | The layout type for the json logger                                                                                       |

&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.graph.build/EGeX4aTAJLlpg9Hh8kfl/semi-structured-transformer/all-configuration-options.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
