Transformed Graph Data

The Transformer can create output data for both Semantic Knowledge Graphs and Property Graphs. The data produced for Knowledge Graphs is RDF, whereby for Property Graphs it is in the form of two CSV files, one for nodes and one for edges.

When creating RDF data for Semantic Knowledge Graphs, the Transformer supports a wide number of data types: NQuads, NTriples, JSON-LD, Turtle, Trig, and Trix. By default, the resulting RDF is represented in the form of NQuads or NTriples if provenance is off, however this can be changed by setting the configuration option OUTPUT_FILE_FORMAT to either nquads, ntriples, jsonld, turtle, trig, or trix.

To create CSV output data for a Property Graph, you must turn on the Property Graph mode by setting PROPERTY_GRAPH_MODE to true and then selecting your graph provider by setting PG_GRAPH to either neptune, tigergraph, neo4j, or default.

The RDF or CSV data files created and output from the Transformer are the same and are not dependant on how it was triggered. The way in which this information is communicated back to you varies slightly for each method.

Resulting Data

Endpoint

After being trigger via the Process endpoint, once ingestion and transformation of data has successfully been completed, the response returned from the Transformer is in the form of a JSON object. This JSON response contains a list of all the URLs of generated RDF files. Multiple files are only generated when specifying an SQL Limit and Offset in your mapping query, or multiples of two files are created when using Property Graph mode.

Sample Knowledge Graph response:

{
    "successfulIterations": 3,
    "outputFileLocations": [
        "file:///data/sqltran/output/SQL-Transformer-31235bfd-a6fb-43aa-bb51-e3d41c481983.nq",
        "file:///data/sqltran/output/SQL-Transformer-5b27a2a4-c971-4fd4-b128-d131bf5c3981.nq",
        "file:///data/sqltran/output/SQL-Transformer-5ecd6e81-65ff-4dbc-aed8-d6d320b7d04a.nq"
    ],
    "processingTime": 20
}

Sample Property Graph response:

{
    "successfulIterations": 1,
    "outputFileLocations": [
        "file:///data/sqltran/output/SQL-Transformer-edges-31235bfd-a6fb-43aa-bb51-e3d41c481983.csv",
        "file:///data/sqltran/output/SQL-Transformer-vertices-5b27a2a4-c971-4fd4-b128-d131bf5c3981.csv"
    ],
    "processingTime": 20
}

Cron Job

If the ingestion has been triggered via the job scheduler, your confirmation of success will come in the form of log messages, these can be found dependant on your configuration.

Kafka

If you have a Kafka Cluster set up and running, then the successfully generated RDF file URL(s) will be pushed to you Kafka Queue. It will be pushed to the Topic specified in the KAFKA_TOPIC_NAME_SUCCESS config option, which defaults to “success_queue”. This will happen with both methods of triggering the Transformer. One of the many advantages of using this approach is that now this transformed data can be ingested using our Graph Writer which will publish the RDF to a Semantic Knowledge Graph or CSV to a Property Graph of your choice!

Dead Letter Queue

If something goes wrong during the operation of the Transformer, the system will publish a message to the Dead Letter Queue Kafka topic (defaults to “dead_letter_queue”) explaining what went wrong along with meta-data about that ingestion, allowing for the problem to be diagnosed and later re-ingested. If enabled, the provenance generated for the current ingestion will also be included as JSON-LD. This message will be in the form of a JSON object with the following structure:

{
	"name": "SQL-Transformer",
	"time": "2022-04-21T11:36:10.002",
	"type": "SQL-Transformer",
	"error": "Record could not be processed due to: Invalid Database Credentials.",
	"version": "2.0.3",
	"provenance": "...[prov]..."
}

Provenance Data

Within the SQL Transformer, time-series data is supported as standard, every time a Transformer ingests some data we add provenance information. This means that you have a full record of data over time, allowing you to see what the state if the data was at any moment. The model we use to record Provenance information is the w3c standard PROV-O model.

Provenance files are uploaded to the location specified in the PROV_OUTPUT_DIR_URL, then this file location is pushed to the Kafka Topic declared in PROV_KAFKA_TOPIC_NAME_SUCCESS. The provenance activities in the SQL Transformer are main-execution, kafkaActivity, queryProcessing, tableProcessing, and singleQueryExecution.

For more information on how the provenance is laid out, as well as how to query it from your Triple Store, see the Provenance Guide.

Last updated