Configuring the Writer

Configuring the Writer

As with the Transformers supplied by Graph.Build, the Graph Writer has a wide array of user configuration, all of which can be set and altered both before the startup of the Writer and during the running of a Writer. The former is done through the use of environment variables in your Docker container or ECS Task Definition, and latter is done through the use of an exposed endpoints, as seen below. For a breakdown of every configuration option in the Graph Writer, see the full list here.

Configuration Access

Accessing the Config

Once the Writer has started and is operational, you can request to view the current configuration by calling the /config endpoint. This is expanded upon below, including the ability to specify specific config properties.

Editing config

As explained below, the configuration on a running Writer can be edited through the /updateConfig endpoint.

Backup and Restore Config

A useful feature in the Writer, is the ability to backup and restore your configuration. This is particularly beneficial when you’ve made multiple changes to the config on a running Writer, and want to be able to restore this without rerunning any update config commands. To backup your config, simply call the /uploadConfigBackup endpoint, and all changes you’ve made to the config will be uploaded to the storage location specified in your CONFIG_BACKUP env var.

To restore your configuration, this must be done on the startup of a Writer, therefore, by setting the CONFIG_BACKUP config option as an environment variable in your startup script / task definition. This must however be a remote directory such as S3, as anything local will be deleted if a task or container is stopped.

Configuration Categories

Mandatory Configuration (Local Deployment)

  • License - TRANSFORMER_LICENSE

    • This is the license key required to operate the Writer when being run on a local machine outside of AWS Marketplace, request your new unique license key here.

Graph Database Configuration

  • Graph Database Endpoint - GRAPH_DATABASE_ENDPOINT

    • This is the endpoint for your Graph Database you wish to upload your Graph Data to and therefore required for the Graph Writer to work.

  • Graph Database Type - GRAPH_DATABASE_TYPE

    • This is your Graph Database type, some graphs will support the default sparql type (e.g. AllegroGraph), however certain graphs require specific type declaration, these include graphdb, stardog, blazegraph, neptune-sparql, and rdfox.

    • If you are using a Property Graph, you can set a specific Graph provider, including neo4j, neptune-cypher, neptune-gremlin, or the traversal language cypher or gremlin.

  • Graph Database Username and Password - GRAPH_DATABASE_USERNAME and GRAPH_DATABASE_PASSWORD

    • This is the username and password of your Graph Database. You can leave these fields blank if your Graph does not require any authentication.

AWS Configuration

When running the Writer in ECS, these settings are not required as all credentials are taken directly from the EC2 instance running the Writer. If you wish to use AWS cloud services while running the Writer on-prem, you need to specify an AWS Access Key and Secret Key, and AWS Region. By providing your AWS credentials, this will give you permission for accessing, downloading, and uploading remote files to S3 Buckets. The S3 Region option specifies the region of where in AWS your files and services reside. To do this, the Writer utilises the AWS Default Credential Provider Chain, allowing for a number of methods to be used. The simplest is by setting the environment variables for AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION. Please note that all services must be in the same region, including if you choose to run the Writer in an EC2 instance.

Kafka Configuration

One of the many ways to interface with the Writer is through the use of Apache Kafka. With the Graph Writer, a Kafka Message Queue can be used for managing the input of RDF data into the Writer. To properly set up your Kafka Cluster, see the instructions here. Once complete, use the following Kafka configuration variables to connect the cluster with your Writer. If you do not wish to use Kafka, please set the variable TRANSFORMER_RUN_STANDALONE to true.

The Kafka Broker is what tells the Writer where to look for your Kafka Cluster, so set this property as follows: <kafka-ip>:<kafka-port>. The recommended port is 9092.

Provenance Configuration

Currently, the Graph Writer does not generate its own provenance meta-data and so the RECORD_PROVO configuration option is set to false. However, any provenance previously generated is separate to this option, and will still be ingested into your Knowledge Graph. If you are using Kafka, ensure that your Kafka source topic is correctly configured if your Writer provenance is pushed to a separate queue from your generated output data.

Logging Configuration

Logging in the Graph Writer works the same way as the Transformers, and like with most functionality is configurable through the use of environment variables. When running the Graph Writer locally from the command line, the Writer will automatically log to your terminal instance. In addition to this, the archives of logs will be saved within the docker container at /var/log/graphbuild/text/archive/ and /var/log/graphbuild/json/archive/ for text and JSON logs respectively, where the current logs can be found at /var/log/graphbuild/text/current/ and /var/log/graphbuild/json/current/. By default, a maximum of 7 log files will be archived for each file type, however this can be overridden. If running a Writer on cloud in an AWS environment, then connect to your instance via SSH or PuTTY, and the previously outlined logging locations apply.

By default, the Writer logs at INFO level, this can be changed by overriding the LOG_LEVEL_TRANSFORMER option, however can only be done on Writer startup and will require a reboot if not.

Optional Configuration

There is also a further selection of optional configurations for given situations, see here for the full list.

Last updated