Modelling Multiple Semi Structured Source Files

The Semi-Structured Transformer allows up to five different source data files to be used within a single model. Relationships (edges) can be created between entities (nodes) from different sources. However, to prevent the relationship from generating a Cartesian product—where every instance of node type X is linked to every instance of node type Y—a join condition is required. The join condition ensures that an edge or relationship is only created when a specified field from each source file contains matching values.

In the example below, two source files are used: one containing transaction data and another containing person and card data.

  • The Transaction and Location nodes are sourced from the new-transactions.csv file.

  • The Person node is sourced from the person-card.json file.

An edge has been created to link the Person node to the Location node using the madeTransactionIn relationship. To ensure that edges are only created between people and their relevant transaction data, a join condition must be specified on the edge.

In this case, the if and equals fields are used to select data from each source file. Both files contain credit card numbers, so edges will only be created between nodes for records where the card numbers match.

Last updated