Semi Structured Specific Data Modelling

CSV and SQL files typically use a flat structure, while JSON and XML files can contain objects with varying levels of depth. The input blocks in the interface reflect the structure of the input data.

For example, in the following donut.json file:

The key batters contains a batter object, which holds an array of objects with batter information.
The key topping contains an array of objects with topping details.

[
	{
		"id": "0001",
		"type": "donut",
		"name": "Cake",
		"ppu": 0.55,
		"batters": {
			"batter": [
				{
					"id": "1001",
					"type": "Regular"
				},
				{
					"id": "1002",
					"type": "Chocolate"
				},
				{
					"id": "1003",
					"type": "Blueberry"
				}
			]
		},
		"topping": [
			{
				"id": "5001",
				"type": "None"
			},
			{
				"id": "5002",
				"type": "Glazed"
			},
			{
				"id": "5003",
				"type": "Sugar"
			},
			{
				"id": "5004",
				"type": "Powdered Sugar"
			}
		]
	}
]

Although the actual file contains several donut objects, as well as additional toppings and batters, it has been simplified here for brevity. In the input tab, keys with nested levels are indicated with dropdown icons.

Relationship Restrictions for JSON and XML files

In multi-level files like JSON and XML, there are certain restrictions on which fields can be joined. By default, only entities can have outgoing edges and join with other entities or attributes. For JSON and XML files, outgoing edges or relationships can only be created between fields at the same level, or from a field to another at a lower level.

For example, it is possible to link ppu with an outgoing edge to batters.batter.id, but not the reverse, as batters.batter.id cannot have an outgoing edge to ppu. Reverse relationships, where a child links to a parent, are usually an incorrect modelling decision. However, for XML files, advanced users can modify the XPath in the model file if there is a legitimate need to model data this way.

Removing JSON and XML Array Object Grouping

By default, JSON and XML array objects and their fields are grouped, creating a single input block for each field. When this field is selected, all instances are iterated through as part of the model configuration. For example, the donut.json file actually contains three donut objects. Selecting the top-level id as an entity object will result in three entities. If the topping.id field is connected as an attribute, all four toppings related to each donut will be connected to the corresponding parent donut.id in the graph. This is the standard way most users want to model their data.

It is possible to disable this grouping by using the switch in the input tab next to the input file name.

In this ungrouped view, fields from specific objects can be selected. For instance, Donut 1 can be linked only to its second batter, if needed.

Note: When using this mode, only fields from the same object hierarchy can be linked. Graph elements in this mode also cannot be linked with elements from the grouped mode.

PreviousNode and Edge Settings NextModelling Multiple Semi Structured Source Files

Last updated 1 year ago