Chapter 1. Transform

Table of Contents

Overview
Dynamic Transform
Merge and Retain
Example
Summary

Overview

In a transform process, you can manipulate data by performing a sequence of operations according to your specific requirements. You receive exactly what you need from the database by creating one or more processors in the Data Designer. You can view the records after transformation to verify the output. The result of each transformation may be used to overwrite the original field value or add an additional field. The result can be saved as a new data source or forwarded to reports and dashboards for presentation.

The data operations have an impact only upon the result, meanwhile the original data source remains unaffected unless you choose to overwrite. Transforms provide fast, convenient data manipulations in a more intuitive user interface, without the need of writing any scripts.

When you create or edit transforms, the order of transforms is important as some operations depend on fields created by previous steps. Where possible, the tool will prevent you from reordering items to produce an impossible sequence. For example, if Step 5 creates a field called X and Step 7 uses field X, you may be able to move Step 7 up one place to Step 6, but cannot move it above Step 5. Similarly, you cannot move Step 5 below Step 7. In addition, you cannot delete Step 5 because Step 7 depends on it. If you find that an ordering operation (move up or down) or delete operation is disabled, it means that making the change would yield a transform that could never succeed.

Dynamic Transform

Different from static transforms, which is applying an operation to a single field at a time, dynamic transforms work with several fields of a record at once. In static transforms, specify a constant value for data manipulation. In dynamic transforms, use ${@Field_Name} to call values from another field. To illustrate this, let's look into an example which calls values from Field 2 and adds them to values from Field 1. The operation is Transform > Decimal > Add :

Field 1 (Decimal)Field 2 (Decimal)ValueOutput (Decimal)
50.62491.1${@Field 2}51.7249
0.7215.2${@Field 2}5.921

Merge and Retain

Many transforms are related to merge or retain. Before starting to merge or retain, make sure the fields have been sorted first, which minimizes memory use. This is because comparing each record with many other unsorted records requires a significantly large memory if there are huge volumes of data. However, if we only have to compare each record with the previous record, we can run through massive data without needing huge amounts of memory.

Merge operates to ensure that no information is lost. When one field is being operated on, the merge will be between only those records where all other fields are the same. This ensures that the record that is the result of the merge does not discard any information. If you need to merge across variations in non-essential fields, those fields will need to be discarded first.

Retain works similarly with merge. The only difference is that retain does not discard duplicate records after the manipulation.

Example

Using transforms, how does a company calculate how much it receives from sales each month? In the following example we will take sales values that occur throughout the month from a legacy system which for historical reasons may include spaces around the date values. Also, some date values are not available and come into the system as nulls. This is meant to illustrate real-world situations where the data needs to be cleansed before we can begin properly processing it.

  • Step 1: Trim off the extra spaces, tabs and new lines in the string values. The operation is Transform > String > Trim. Select In Place. By doing this, we keep only the trimmed string values.

    The input data is read from a server which may include extra spaces at the beginning and end. The date strings may look like the following. Therefore, we need to trim these spaces before parsing the date:

    • " 2011-07-31 "

    • " 2011-09-30 "

    In the following table, the When field includes string representations of dates when a product was sold and the Value field shows the sale price. Later on we will introduce a Commission field, which is how much of the sale price goes directly to the sales person as commission.

    The When field shows the output:

    When (String)Value (Decimal)
    2011-03-282000.00
    2011-03-073000.00
    2011-3-191000.00
    2000.00
    1000.00
    3000.00
    2011-2-162000.00
    2011-2-271000.00
    2011-02-032000.00
    4000.00
    1000.00
    2011-05-122000.00
    2011-05-161000.00
    2011-05-071000.00
    2011-05-243000.00
    2011-6-072000.00
    2011-6-151000.00
    2011-6-211000.00
    2011-07-092000.00
    2011-07-143000.00
    2011-07-261000.00
    2011-07-291000.00
    1000.00
    2000.00
    2011-09-171000.00
    2011-09-052000.00
    2011-09-111000.00
    2011-09-233000.00
    2011-10-162000.00
    2011-10-043000.00
    2011-10-271000.00
  • Step 2: Convert the strings from the When field into dates. The operation is Transform > String > To date. Select In Place. The new values may look the same as the trimmed strings, but these are now Date objects, which means we can manipulate them as dates rather than just as sequences of characters.

    In the following table, the When field shows the output:

    When (Date)Value (Decimal)
    2011-03-282000.00
    2011-03-073000.00
    2011-03-191000.00
    2000.00
    1000.00
    3000.00
    2011-02-162000.00
    2011-02-271000.00
    2011-02-032000.00
    4000.00
    1000.00
    2011-05-122000.00
    2011-05-161000.00
    2011-05-071000.00
    2011-05-243000.00
    2011-06-072000.00
    2011-06-151000.00
    2011-06-211000.00
    2011-07-092000.00
    2011-07-143000.00
    2011-07-261000.00
    2011-07-291000.00
    1000.00
    2000.00
    2011-09-171000.00
    2011-09-052000.00
    2011-09-111000.00
    2011-09-233000.00
    2011-10-162000.00
    2011-10-043000.00
    2011-10-271000.00
  • Step 3: Extract the month values from the Date field. The operation is Transform > Date > Month.

    In the following table, a new field named "Index" is created to show the output:

    When (String)Value (Decimal)Index (Integer)
    2011-03-282000.003
    2011-03-073000.003
    2011-03-191000.003
     2000.00
     1000.00
     3000.00
    2011-02-162000.002
    2011-02-271000.002
    2011-02-032000.002
     4000.00
     1000.00
    2011-05-122000.005
    2011-05-161000.005
    2011-05-071000.005
    2011-05-243000.005
    2011-06-072000.006
    2011-06-151000.006
    2011-06-211000.006
    2011-07-092000.007
    2011-07-143000.007
    2011-07-261000.007
    2011-07-291000.007
     1000.00
     2000.00
    2011-09-171000.009
    2011-09-052000.009
    2011-09-111000.009
    2011-09-233000.009
    2011-10-162000.0010
    2011-10-043000.0010
    2011-10-271000.0010
  • Step 4: Show the months in long names from the When field. The operation is Transform > Date > Month name (long). Select In Place.

    In the following table, the When field shows the output:

    When (String)Value (Decimal)Index (Integer)
    March2000.003
    March3000.003
    March1000.003
    2000.00 
    1000.00 
    3000.00 
    February2000.002
    February1000.002
    February2000.002
    4000.00 
    1000.00 
    May2000.005
    May1000.005
    May1000.005
    May3000.005
    June2000.006
    June1000.006
    June1000.006
    July2000.007
    July3000.007
    July1000.007
    July1000.007
    1000.00 
    2000.00 
    September1000.009
    September2000.009
    September1000.009
    September3000.009
    October2000.0010
    October3000.0010
    October1000.0010

    You will notice that the nulls in the When field and the Index field are still nulls (represented by empty cells). Unless the transform specifically processes nulls, all transforms will just pass nulls through unchanged. In the next step we will change those nulls into values we can work with.

  • Step 5: Set the Null values from the When field to "Unknown". The operation is Transform > String > Null to string. Select In Place.

    In the following table, the When field shows the output:

    When (String)Value (Decimal)Index (Integer)
    March2000.003
    March3000.003
    March1000.003
    Unknown2000.00 
    Unknown1000.00 
    Unknown3000.00 
    February2000.002
    February1000.002
    February2000.002
    Unknown4000.00 
    Unknown1000.00 
    May2000.005
    May1000.005
    May1000.005
    May3000.005
    June2000.006
    June1000.006
    June1000.006
    July2000.007
    July3000.007
    July1000.007
    July1000.007
    Unknown1000.00 
    Unknown2000.00 
    September1000.009
    September2000.009
    September1000.009
    September3000.009
    October2000.0010
    October3000.0010
    October1000.0010
  • Step 6: Sort the table by month index. The operation is Transform > Sort. Choose the Index field to sort by. Choose the Ascending order. Leave the Max Memory (MB) field blank. By doing this, this operation will sort the sales values without a maximum memory limit.

    The following table shows the output:

    When (String)Value (Decimal)Index (Integer)
    Unknown2000.00
    Unknown1000.00
    Unknown3000.00
    Unknown4000.00
    Unknown1000.00
    Unknown1000.00
    Unknown2000.00
    February2000.002
    February1000.002
    February2000.002
    March2000.003
    March3000.003
    March1000.003
    May2000.005
    May1000.005
    May1000.005
    May3000.005
    June2000.006
    June1000.006
    June1000.006
    July2000.007
    July3000.007
    July1000.007
    July1000.007
    September1000.009
    September2000.009
    September1000.009
    September3000.009
    October2000.0010
    October3000.0010
    October1000.0010
  • Step 7: Create a new field named "Commission" in the original data source. Sum merge the sales values and commissions, and display a single total for each month. The operation is Transform > Reduce. Select the Sum action on the Value field and the Commission field. Leave the action blank on the other fields. By doing this, you can add up the sales values and commissions and view the total for each month. The rows have been merged down to one row per group, where the month names and indexes are the groups.

    Alternatively, you can also sum retain on the sales values and commissions, and then perform a Discard Duplicates Transform to remove the redundant rows. You will receive the same output.

    The following table shows the sorted table with the Commission field added:

    When (String)Value (Decimal)Commission (Decimal)Index (Integer)
    Unknown2000.00100.20 
    Unknown1000.0050.10 
    Unknown3000.00150.30 
    Unknown4000.00200.40 
    Unknown1000.0050.10 
    Unknown1000.0050.10 
    Unknown2000.00100.20 
    February2000.00100.202
    February1000.0050.102
    February2000.00100.202
    March2000.00100.203
    March3000.00150.303
    March1000.0050.103
    May2000.00100.205
    May1000.0050.105
    May1000.0050.105
    May3000.00150.305
    June2000.00100.206
    June1000.0050.106
    June1000.0050.106
    July2000.00100.207
    July3000.00150.307
    July1000.0050.107
    July1000.0050.107
    September1000.0050.109
    September2000.00100.209
    September1000.0050.109
    September3000.00150.309
    October2000.00100.2010
    October3000.00150.3010
    October1000.0050.1010

    The following table shows the output of the Reduce Transform:

    When (String)Value (Decimal)Commission (Decimal)Index (Integer)
    Unknown14000.00701.40 
    February5000.00250.502
    March6000.00300.603
    May7000.00350.705
    June4000.00200.404
    July7000.00350.707
    September7000.00350.709
    October6000.00300.6010
  • Step 8: Subtract the commission from the sales value in each month. The operation is Transform > Decimal > Subtract. Specify ${@Commission} as the value to be subtracted. In this case we aren't subtracting a fixed amount (a static transform) but an amount read from another field - the Commission field. This is termed a dynamic transform, because the result requires input from more than one field.

    In the following table, a new field named "Received" is created to show the output:

    When (String)Value (Decimal)Commission (Decimal)Index (Integer)Received (Decimal)
    Unknown14000.00701.40 13298.60
    February5000.00250.5024749.50
    March6000.00300.6035699.40
    May7000.00350.7056649.30
    June4000.00200.4043799.60
    July7000.00350.7076649.30
    September7000.00350.7096649.30
    October6000.00300.60105699.40

    Now we can save the result of the transformations as a new data source for later processing. The transforms are repeatable, which means every month you can re-run the transforms to receive the most updated set of numbers, rather than create a sequence of transforms but run them only once.

Summary

The following is a summary of the transformations we used in the overview:

  • String trim: remove whitespace at the beginning and end of date strings.

  • String to date: build date objects.

  • Date to month: extract month index from date objects.

  • Date to month name (long): replace date objects with long month names.

  • Null to string: set the null month names to “Unknown”.

  • Sort by month index: view the sales values throughout each month in the ascending order.

  • Reduce: sum merge the sales values and commissions, reduce the redundant rows and display a single total for each month.

  • Subtract: subtract the commissions from the sales values to calculate how much the company receives from sales each month.

You can find more details on these and all other supported transformations in the following chapters.