Different Options, Best Performance

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Daddy Doma
Premium Member
Premium Member
Posts: 62
Joined: Tue Jun 14, 2005 7:17 pm
Location: Australia
Contact:

Different Options, Best Performance

Post by Daddy Doma »

Hi Guyz,

I have a couple of options to acheive a result, and wonder whether there would be a significant performance advantage in either.

Option 1: Two Datasets, through transformer, into funnel.

Code: Select all

DS_A-----TR_A
             \
              FU--->
             /
DS_B-----TR_B
Option 2: Two Datasets, through column export, then again, then column generator, into funnel:

Code: Select all

DS_A-----CE_A1-----CE_A2-----CG_A1
                                  \
                                   FU--->
                                  /
DS_B-----CE_B1-----CE_B2-----CG_B1
What am I trying to do?

I have multiple columns in each dataset. I need some of these to combine into a single value to form an ID_SET. I want the other columns to combine into a single value to form an ATTRIBUTE_SET. This information is funnelled together but I want to keep the attributes seperate, e.g.

ID_SET
ATTRIBUTE_SET_A
ATTRIBUTE_SET_B

So, I create an empty column in stream A to match the stream B and vice versa. I can acomplish all this in a single Transformer stage per stream (option 1), but have now hit the issue of Nulls in my attributes.

I could assess each value that makes up the ATTRIBUTE_SET individually for NullToValue. But I wonder if using the inbuilt Null Value settings in the Column Export stages would be quicker then having a transformer assess every column. My option 2 plan is to:

- Create ID_SET in the first Column Export,
- Create ATTRIBUTE_SET in the secong Column Export, then
- Create the dummy column(s) in a Column Generator for input to the funnel stage.

Would this be quicker then a single transformer? I am dealing with data volumes in the high millions and this function will be repeated many times for many data sources - any thoughts?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you're running 7.5.1 or later I'd try the single Transformer stage. They did address most of the performance drags in this stage for this release.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

The column export might be slower than a transformer. Hard to say. It does the null to value stuff but it might do some other conversion validations you don't need. The transformer should be the simpler design and the search and replace option lets you add a lot of nulltovalue commands quickly.
Post Reply