Astrato’s “Unified Star Schema” (USS) in our data analytics engine is designed to tackle some of the most common problems of data modeling that occur in Business Intelligence (BI) outcomes.
Avoiding common data modeling errors
One of the most common problems is described as a Fan Trap. A Fan Trap is a combination (join) of two tables, where one table contains numbers (used in measures) and the second table when joined to the first, duplicates the rows containing numbers, leading to incorrect results.
Fan Trap example
In our example, Sales (master table) and Shipments (detail table) are joined by the Sales ID field.
Figure 1: Data key joins from Shipments to Sales, visualizing the many-to-one relationship.
The Sales ID is unique in Sales but not in Shipments (because one row of Sales may be associated with multiple rows of Shipments). As a result, when the two tables are joined, the Sales Amount is duplicated and no longer reconciles with the total of the data source.
Figure 2: Fan Trap outcome showing Sales Amount duplicated
There is nothing wrong with the data source, and there is nothing wrong with creating this join. In our example, the Shipment Amount measure shows 100 on each row, but these are not duplicates because each row of Shipments is associated with one row from the Sales table.
When discussing joins between tables, we describe it from the perspective of a many-to-one relationship. In our example, Shipment Sales ID is the ‘many’/foreign key from the Shipments (detail) table to Sales ID, which is the ‘one’/primary key on the Sales (master) table. The relationship can also be described as “Shipments points to Sales.”
To improve understanding and support troubleshooting your data model output, Astrato represents the join direction in your data view with an arrow that always starts from the ‘many’ side to the ‘one’ side.
Our data engine applies the UNION ALL operations by merging the two SELECT statements and then integrating the outcome by integrating a further SELECT statement with a GROUP BY aggregation to remove the duplicates.
Figure 3: Join direction showing the arrow always begins from many to one.
Please note: The engine's decision is uniquely based on the direction of the arrow and not on the data itself. That’s why the direction of the joins is so essential.
Creating Joins between tables
After selecting your tables in the Data View, no joins are defined. There are three options to create a new join:
Accept the join suggestions in the Joins dialog that displays.
Click the (+) icon beside the data field and drag the arrow from the primary table to the field in the other table.
Click Joins to view your joins in a list, click Add join, select the table and fields you want and click Create.
Figure 4: Click the Joins tab to view your joins in a list.
Understanding Join Quality
The foundation of join quality is based on ensuring your joins always use the fields containing multiples as the foreign key, with the unique field in your master table as the primary key.
The join quality table includes stats to explain how join quality is assigned to the join you've created. The join quality table includes these stats:
Keys count - number of unique values in the keys used in the join.
Rows count - number of overall rows in the table.
Key quality - Keys count divided by Rows count. We expect to see 100% on the master table on a one-to-many join. This indicates a good quality for the join.
Orphans - number of values from the Keys count that don't have a corresponding value in the other table. This information can help to ensure you've connected the correct fields. Usually, you expect to see close to zero on the join quality table.
Figure 5: Join Quality statistics showing 100% for Key Quality and zero Orphans.