All Configuration Options

Below is a table containing all of the configurable options within the Graph Writer. These can be set when configuring your stack or on an already running Writer.

To do this, simply call the endpoint /updateConfig?configEntry=<entry>&configValue=<value> where the entry the config item as seen below, and value is the new value you wish to set. Any configuration changed while the Writer is running can also be backed up and restored.

Transformer Writer Configuration

Environment Variable

Entry

Default Value

Description

FRIENDLY_NAME

friendlyName

Graph-Writer

The name you wish to set your Writer up with.

TRANSFORMER_LICENSE

transformerLicense

The License key provided required for running the Writer. Only required when running a non AWS Marketplace version of the Writer.

GRAPH_DATABASE_ENDPOINT

graphDatabaseEndpoint

The endpoint for the graph database you wish to upload your data to.

GRAPH_DATABASE_TYPE

graphDatabaseType

sparql

The Knowledge Graph type, some Semantic Graphs will support the default sparql type (e.g. AllegroGraph), however certain graphs require specific type declaration, these include graphdb, stardog, blazegraph, neptune-sparql and rdfox. If you are using a Property Graph, you can set a specific Graph provider, including neo4j, neptune-cypher, neptune-gremlin, or the traversal language cypher or gremlin.

GRAPH_DATABASE_REASONING

graphDatabaseReasoning

false

Whether you want reasoning enabled or disabled. This only applies to Semantic Graphs.

GRAPH_DATABASE_USERNAME

graphDatabaseUsername

The username of your Graph Database. Leave blank if your Knowledge Graph does not require any authentication.

GRAPH_DATABASE_PASSWORD

graphDatabasePassword

The password of your Graph Database. Leave blank if your Graph Database does not require any authentication.

CONFIG_BACKUP

configBackup

file:///var/local/config-backup/

The URL directory where the config will be backed up to when calling the upload config endpoint

DELETE_SOURCE

deleteSourceFile

false

Whether you wish to delete the source data file after it has been written to the Graph Database

TRANSFORMER_RUN_STANDALONE

runStandalone

true

The Graph Writer is designed to run as part of a larger end to end system with the Transformer providing the Writer with RDF or CSV files to write to a Graph Database. As part of this process, Kafka is used to communicate between services. This is turned off by default, however if you want to enable the running of the Graph Writer with connected services, set this property to false.

INGESTION_MODE

ingestionMode

insert

How to process the ingested data.

  • 'insert': the new data is ingested in full and do not replace existing data. The new dataset adds new values to already existing subject-predicates.

  • 'update': the new data is used for updating the existing data. The new dataset replaces value in existing subject-predicate.

Please note this only applies to Semantic Graphs, in Property Graphs the Writer defaults to an upsert pattern, updating the properties for a given node or edge.

Kafka Configuration

Environment Variable

Entry

Default Value

Description

KAFKA_BROKERS

kafkaBrokers

localhost:9092

The Kafka Broker is what tells the Writer where to look for your Kafka Cluster. Set with the following structure <kafka-ip>:<kafka-port>. The recommended port is 9092.

KAFKA_TOPIC_NAME_SOURCE

topicNameSource

success_queue

The topic used for the Consumer to read messages from containing the URLs of the source data files to ingest.

KAFKA_TOPIC_NAME_DLQ

topicNameDLQ

dead_letter_queue

The topic used to push messages containing reasons for failure within the Writer. These messages are represented as JSON.

KAFKA_GROUP_ID_CONFIG

groupIdConfig

consumerGroup1

The identifier of the group this consumer belongs to.

KAFKA_AUTO_OFFSET_RESET_CONFIG

autoOffsetResetConfig

earliest

What to do when there is no initial offset in Kafka or if an offset is out of range.

earliest: automatically reset the offset to the earliest offset

latest: automatically reset the offset to the latest offset

KAFKA_MAX_POLL_RECORDS

maxPollRecords

100

The maximum number of records returned in a single call to poll.

KAFKA_TIMEOUT

timeout

1000

Kafka consumer polling time out.

Provenance Configuration

Environment Variable

Entry

Default Value

Description

RECORD_PROVO

recordProvo

false

Currently, the Graph Writer does not generate its own provenance meta-data and so this option is set to false

Logging Configuration

Environment Variable

Default Value

Description

LOG_LEVEL_TRANSFORMER

INFO

Log level for Writer/Transformer loggers - change to DEBUG to see more in depth logs, or to WARN or ERROR to quiet the logging.

Additional Logging Configuration

Environment Variable

Default Value

Description

LOGGING_LEVEL

WARN

Global log level

LOGGING_APPENDERS_CONSOLE_TIMEZONE

UTC

Timezone for console logging

LOGGING_APPENDERS_TXT_FILE_THRESHOLD

ALL

Threashold for text logging

Log Format (not overridable)

%-6level [%d{HH:mm:ss.SSS}] [%t] %logger{5} - %X{code} %msg %n

Pattern for logging messages

Current Log Filename (not overridable)

/var/log/graphbuild/text/current/application_${applicationName}_${timeStamp}.txt.log

Pattern for log file name

LOGGING_APPENDERS_TXT_FILE_ARCHIVE

true

Archive log text files

Archived Log Filename Pattern (not overridable)

/var/log/graphbuild/text/archive/application_${applicationName}_${timeStamp}_to_%d{yyyy-MM-dd}.txt.log

Log file rollover frequency depends on pattern in following property. For example %d{yyyy-MM-ww} declares rollover weekly

LOGGING_APPENDERS_TXT_FILE_ARCHIVED_TXT_FILE_COUNT

7

Max number of archived text files

LOGGING_APPENDERS_TXT_FILE_TIMEZONE

UTC

Timezone for text file logging

LOGGING_APPENDERS_JSON_FILE_THRESHOLD

ALL

Threashold for text logging

Log Format (not overridable)

%-6level [%d{HH:mm:ss.SSS}] [%t] %logger{5} - %X{code} %msg %n

Pattern for logging messages

Current Log Filename (not overridable)

/var/log/graphbuild/json/current/application_${applicationName}_${timeStamp}.json.log

Pattern for log file name

LOGGING_APPENDERS_JSON_FILE_ARCHIVE

true

Archive log text files

Archived Log Filename Pattern (not overridable)

/var/log/graphbuild/json/archive/application_${applicationName}_${timeStamp}_to_%d{yyyy-MM-dd}.json.log

Log file rollover frequency depends on pattern in following property. For example %d{yyyy-MM-ww} declares rollover weekly

LOGGING_APPENDERS_JSON_FILE_ARCHIVED_FILE_COUNT

7

Max number of archived text files

LOGGING_APPENDERS_JSON_FILE_TIMEZONE

UTC

Timezone for text file logging

LOGGING_APPENDERS_JSON_FILE_LAYOUT_TYPE

json

The layout type for the json logger

Last updated