The Join processor is used to aggregate two data sources using one of several join types (e.g. Inner, outer, etc). The join operator links the records from two data sources by matching values in specified columns.
Elixir Data Designer supports all the three types of Joins: inner join, outer join and cross join.
The Join processor can be selected from the Designer menu bar and then placed on the diagram workspace. The Join processor must get input data from two DataSources.
Properties
The Join Processor properties are shown in Figure 4.5, “Join Wizard”.
The name of the Join can be entered in the text box provided. The Choose Colour dialog box is invoked on clicking the Colour button from which the colour of the Join processor can be selected.
The dialog consists of three tabs: Options, Primary and Secondary.
In the Options tab, a check box is provided for the Cross join. When this check box is selected the Cross Join is performed on the two data sources. The cross join produces every combination of input records. For example, if the primary datasource provides ten records and the secondary datasource provides five records, then fifty records (10x5) will be available as the output. For large datasets a huge number of records could be generated.
There are two options Keep and Discard provided in the Combo box for "If no matching secondary". If the keep option is selected, then an Outer join can be performed on the data sources. If the Discard option is selected, then an Inner join can be performed on the data sources.
Similarly, three options Repeat, Keep and Discard are provided for "If multiple matching secondaries". These control how multiple matches are handled. If the primary record matches three secondary records, the system can a) pass through three records, matching the single primary with each of the secondaries in turn (Repeat). b) just pass through the primary with just the first secondary (Keep), c) not pass through the record at all (Discard).
The Primary prefix and the Secondary prefix can be entered in the text boxes if required. The Primary prefix and Secondary prefix are used to modify the field names in the merged output record based on the primary and secondary names in case they conflict.
For example, an Employee DataSource with an Id and DepartmentId would conflict with a Department DataSource with its own Id. If both Ids need to be passed through, then one or both of them should be given a prefix to prevent duplicate names.
The fields present in the primary data source are displayed in the Primary tab when the data source is connected with the Join processor . The fields of the primary data source that are not required can be discarded to reduce memory use.
The fields present in the secondary data source are displayed in the Secondary tab when the data source is connected with the Join processor. The fields of the secondary data source that are not required can be discarded. Choose the appropriate key or keys in the secondary tab and associate them with the corresponding primary keys. Records will be selected from the secondary data source where the identified keys match those of the primary.
Here's how to join the Sales and Customer data sources:
Add the JDBC data sources, Sales and Customer using the procedure given in the previous chapter.
Add a Composite DataSource with the name
Join
.
Select the Sales data source, drag and drop it in the designer window. Similarly drag and drop the Customer data source onto the designer.
Create a Join processor on the diagram.
Connect the Customer data source as primary and Sales data source as secondary to the Join processor and connect the output to the Result. The diagram flow is shown in Figure 4.6, “Sample Join Flow”.
Inner Join
Inner join combines column values from one record of a data source with column values of a record from another (or the same) data source to form a single, merged record of data.
Using an Inner join on the Sales and Customer data sources you can explore the result of setting different options for the "If multiple matching secondaries".
a) If multiple matching secondaries - Repeat
Using the flow described above, open the Join properties and select the
Discard
option from the "If no
matching secondary" Combo box.
Select the Repeat
option from
the "If multiple matching secondaries" combo box.
Select the secondary tab. Select the customer_id field in the primary column against the customer_id field of the secondary data source as shown in Figure 4.7, “Join Wizard”.
Click the Finish button to close the Join Wizard and return back to the Designer window.
Select the Result, and choose View from the popup menu. The output is shown in the Figure 4.8, “Inner Join Result”.
This query fetches all primary records which have one or more matching secondary records. The primary record is repeated once for each secondary match. If there is no secondary match, the primary record is discarded.
b) If multiple matching secondaries - Keep
This query fetches all primary records which have one or more matching secondary records. The primary record is only output once, merged with the first secondary match. Subsequent secondary matches are ignored. If there is no secondary match, the primary record is discarded.
c) If multiple matching secondaries - Discard
This query fetches all primary records which have exactly one matching secondary record. The primary record is output once merged with the secondary match. If there is no secondary match, or multiple secondary matches, the primary record is discarded.
Outer Join
Outer join is a type of join in which both matching and non matching rows are returned. The values of all columns from the unmatched table in non-matching rows are set to NULL.
Using an Outer join on the Sales and Customer data sources you can explore the result of setting different options for the "If multiple matching secondaries".
a) If multiple matching secondaries - Repeat
Return to the same Join
diagram and edit the Join properties by
selecting the option Keep
from the "If no matching
secondary" and Repeat
from
the "If multiple matching secondaries" Combo box.
Enter "pri" in the Primary Prefix text box and "sec" in the Secondary Prefix text box.
We will leave the secondary tab as before, linking the customer_id fields between primary and secondary.
Click the Finish button and view the Result output. The output is shown in the Figure 4.9, “Outer Join Result”.
This query fetches all primary records. Each primary will be output as many times as there are matching secondary records. However, if there is no matching secondary record, the primary record will still be output once with the secondary fields set to NULL.
b) If multiple matching secondaries - Keep
This query fetches all records from the primary datasource and connects them to the first matching record from the secondary datasource. If no secondary record matches, the output record contains NULL for those values.
c) If multiple matching secondaries - Discard
This query fetches all records from the primary datasource. If there is exactly one matching secondary, it is merged. If there is zero or more than one matching secondary then NULL is used for each secondary field in the output.
Cross Join
A Cross Join merges each possible combination of records from the primary and secondary datasources. The output from a cross join can be very large!
Reusing the same Join
composite,
open the Properties and select the Cross Join check box.
You will notice the combo boxes are now disabled as
all combinations are generated.
Click the Finish button and view the Result output. The output is shown in the Figure 4.10, “Cross Join Result”.
The combination of all the records from both data sources are displayed. For instance, if there are 25 records in the Customer table and 40 records in the Sales table a total of 1000 (25*40) records are fetched by using Cross Join.