I have a couple of options to acheive a result, and wonder whether there would be a significant performance advantage in either.
Option 1: Two Datasets, through transformer, into funnel.
Code: Select all
DS_A-----TR_A
\
FU--->
/
DS_B-----TR_B
Code: Select all
DS_A-----CE_A1-----CE_A2-----CG_A1
\
FU--->
/
DS_B-----CE_B1-----CE_B2-----CG_B1
I have multiple columns in each dataset. I need some of these to combine into a single value to form an ID_SET. I want the other columns to combine into a single value to form an ATTRIBUTE_SET. This information is funnelled together but I want to keep the attributes seperate, e.g.
ID_SET
ATTRIBUTE_SET_A
ATTRIBUTE_SET_B
So, I create an empty column in stream A to match the stream B and vice versa. I can acomplish all this in a single Transformer stage per stream (option 1), but have now hit the issue of Nulls in my attributes.
I could assess each value that makes up the ATTRIBUTE_SET individually for NullToValue. But I wonder if using the inbuilt Null Value settings in the Column Export stages would be quicker then having a transformer assess every column. My option 2 plan is to:
- Create ID_SET in the first Column Export,
- Create ATTRIBUTE_SET in the secong Column Export, then
- Create the dummy column(s) in a Column Generator for input to the funnel stage.
Would this be quicker then a single transformer? I am dealing with data volumes in the high millions and this function will be repeated many times for many data sources - any thoughts?