Transformed Graph Data

The Transformer can create output data for both Semantic Knowledge Graphs and Property Graphs. The data produced for Knowledge Graphs is RDF, whereby for Property Graphs it is in the form of two CSV files, one for nodes and one for edges.

When creating RDF data for Semantic Knowledge Graphs, the Transformer supports a wide number of data types: NQuads, NTriples, JSON-LD, Turtle, Trig, and Trix. By default, the resulting RDF is represented in the form of NQuads or NTriples if provenance is off, however this can be changed by setting the configuration option OUTPUT_FILE_FORMAT to either nquads, ntriples, jsonld, turtle, trig, or trix .

To create CSV output data for a Property Graph, you must turn on the Property Graph mode by setting PROPERTY_GRAPH_MODE to true and then selecting your graph provider by setting PG_GRAPH to either neptune, tigergraph, neo4j, or default.

The RDF or CSV data files created and output from the Transformer are the same and are not dependant on how it was triggered. The way in which this information is communicated back to you varies slightly for each method.

Resulting Data

Endpoint

Once an input file has successfully been processed after being ingested via the Process endpoint, the response returned from the Transformer is a JSON object. Within the JSON response is the outputFileLocations element; this element contains a list of all the URLs of generated RDF files. Usually this would be a single file (or two files for Property Graphs), however multiple files will be generated and listed when ingesting large CSV files.

Sample Knowledge Graph output:

{
    "input": [
        "file:///var/local/input/input-data.csv"
    ],
    "failedIterations": 0,
    "successfulIterations": 1,
    "outputFileLocations": [
        "file:///var/local/output/Semi-Structured-Transformer-44682bd6-3fbc-429b-988d-40dda8892328.nq"
    ],
    "mappingFiles": [
        "file:///var/local/mappings/mapping.ttl"
    ]
}

Sample Property Graph output:

Kafka

If you have a Kafka Cluster set up and running, then the successfully generated RDF file URL(s) will be pushed to your Kafka Queue. It will be pushed to the Topic specified in the KAFKA_TOPIC_NAME_SUCCESS config option, which defaults to “success_queue”. This will happen with both methods of triggering the Transformer. One of the many advantages of using this approach is that now this transformed data can be ingested using our Graph Writer which will publish the RDF to a Semantic Knowledge Graph or CSV to a Property Graph of your choice!

Dead Letter Queue

If something goes wrong during the operation of the Transformer, the system will publish a message to the Dead Letter Queue Kafka topic (defaults to “dead_letter_queue”) explaining what went wrong along with meta-data about that ingestion, allowing for the problem to be diagnosed and later re-ingested. If enabled, the provenance generated for the current ingestion will also be included as JSON-LD. This message will be in the form of a JSON object with the following structure:

Provenance Data

Within the SS Transformer, time-series data is supported as standard, every time a Transformer ingests some data we add provenance information. This means that you have a full record of data over time, allowing you to see what the state if the data was at any moment. The model we use to record Provenance information is the w3c standard PROV-O model.

Provenance files are uploaded to the location specified in the PROV_OUTPUT_DIR_URL, then this file location is pushed to the Kafka Topic declared in PROV_KAFKA_TOPIC_NAME_SUCCESS. The provenance activities in the SS Transformer are main-execution, kafkaActivity, and transformer-iteration.

For more information on how the provenance is laid out, as well as how to query it from your Triple Store, see the Provenance Guide.

Last updated