Reduce Transform

If you choose the Reduce type in a transform process, you can specify the following operations (or actions) on one or more fields of data. It reduces a set of records down to one summary record, which means corresponding rows will be merged into one row. Before starting this transform, make sure the fields have been sorted first, which minimizes memory use. This is because comparing each record with many other unsorted records requires a significantly large memory if there are huge volumes of data. However, if we only have to compare each record with the previous record, we can run through massive data without needing huge amounts of memory.

The following table shows an example of the input shared by all the operations under Reduce Transform:

Table 6.1. Shared input

Field 1Field 2Field 3Field 4
AB2.02.9
AB1.01.9
AB3.03.9
DE5.05.9
DE4.04.9

Average

This operation calculates the values from the specified fields into the average.

If you choose the Average action for Field 3 and Field 4, the following table shows the output of Table 6.1, “Shared input”:

Field 1Field 2Field 3Field 4
AB2.02.9
DE4.55.4

Comma Separated List

This operation merges the values from the specified fields into an unordered list separated by comma.

If you choose the Comma Separated List action for Field 3 and Field 4, the following table shows the output of Table 6.1, “Shared input”:

Field 1Field 2Field 3Field 4
AB2.0,1.0,3.02.9,1.9,3.9
DE5.0,4.05.9,4.9

Comma Separated Set

This operation merges the values from the specified fields into an ordered list separated by comma.

If you choose the Comma Separated Set action for Field 3 and Field 4, the following table shows the output of Table 6.1, “Shared input”:

Field 1Field 2Field 3Field 4
AB1.0,2.0,3.01.9,2.9,3.9
DE4.0,5.04.9,5.9

Count

This operation counts how many values are in the specified fields.

If you choose the Count action for Field 3 and Field 4, the following table shows the output of Table 6.1, “Shared input”:

Field 1Field 2Field 3Field 4
AB33
DE22

First

This operation extracts the first value from the specified fields.

If you choose the First action for Field 3 and Field 4, the following table shows the output of Table 6.1, “Shared input”:

Field 1Field 2Field 3Field 4
AB2.02.9
DE5.05.9

Last

This operation extracts the last value from the specified fields.

If you choose the Last action for Field 3 and Field 4, the following table shows the output of Table 6.1, “Shared input”:

Field 1Field 2Field 3Field 4
AB3.03.9
DE4.04.9

Max

This operation merges the values from the specified fields into the maximum.

If you choose the Max action for Field 3 and Field 4, the following table shows the output of Table 6.1, “Shared input”:

Field 1Field 2Field 3Field 4
AB3.03.9
DE5.05.9

Median

This operation merges the values from the specified fields into the median value. If there are only two values in a field, it returns the average directly.

If you choose the Median action for Field 3 and Field 4, the following table shows the output of Table 6.1, “Shared input”:

Field 1Field 2Field 3Field 4
AB2.02.9
DE4.55.4

Min

This operation merges the values from the specified fields into the minimum.

If you choose the Min action for Field 3 and Field 4, the following table shows the output of Table 6.1, “Shared input”:

Field 1Field 2Field 3Field 4
AB1.01.9
DE4.04.9

Product

This operation calculates the values from the specified fields into the product.

If you choose the Product action for Field 3 and Field 4, the following table shows the output of Table 6.1, “Shared input”:

Field 1Field 2Field 3Field 4
AB6.00021.489
DE20.0028.91

Standard Deviation

This operation calculates the values from the specified fields into the standard deviation.

If you choose the Standard Deviation action for Field 3 and Field 4, the following table shows the output of Table 6.1, “Shared input”:

Field 1Field 2Field 3Field 4
AB11
DE0.7071*0.7071*

*This is an approximate value of the output for the convenience of illustration.

Sum

This operation calculates the values from the specified fields into the summary.

If you choose the Sum action for Field 3 and Field 4, the following table shows the output of Table 6.1, “Shared input”:

Field 1Field 2Field 3Field 4
AB6.08.7
DE9.010.8

Variance

This operation calculates the values from the specified fields into the variance.

If you choose the Variance action for Field 3 and Field 4, the following table shows the output of Table 6.1, “Shared input”:

Field 1Field 2Field 3Field 4
AB1.001.00
DE0.500.50