Process multiple rows at a time in 'Transformer' stage

yaminids · Post by **yaminids** » Tue Apr 26, 2005 1:46 pm

Hello friends,

I am trying to process multiple rows in the 'Transformer' stage at a time and insert rows in the target file based on the result.

For example if the data in the input file is:

Cust_Name-------Emp_Id
------------------------------
A------------123
B------------123
C------------123
A------------456
B------------789
A------------435
B------------435
C------------435

==============================================

If the EmP-Id is same for the customers then I want to insert 'Null' in the Emp_Id column in the target file.

My intension is to pupulate the target file as:

Cust_Name-------Emp_Id
------------------------------
A------------NULL
B------------NULL
C------------NULL
A------------456
B------------789
A------------NULL
B------------NULL
C------------NULL

===========================================

Can anyone help me with this

Thanks a lot in advance
-Yamini

mhester · Post by **mhester** » Tue Apr 26, 2005 3:01 pm

Since the paradigm for DataStage is to process one row at a time your situation presents a moderate challenge. You can probably accomplish this several ways, but if you want to use a job and your source is sequential then you might consider using the following -

Code: Select all


Seq_Master ------->Aggregator-------->Hash
                                      |
                                      v
                Seq_Master-------->Transform------------>Seq

The top stream or Seq_Master to aggregator to Hash will process first then the main stream will process. Note that Seq_Master is the same file. It is used to build your Hash and is also input to the main stream.

In your aggregator you will want to count Cust_Name and group by Emp_Id. Write this to the Hash with the key to the Hash being Emp_Id. Note that you use the same hash file as both input and output and you can change the column name for Cust_Name to cnt or count or something else to be more meaningful for the purposes of what you are doing. Basically we are getting a count for each group of Emp_Id's so you can use Cust_Name, but at this point it really becomes a count.

Perform a lookup in the main stream against your hash file based on Emp_Id. The derivation on the right side for Emp_Id would look like -

Code: Select all

IF HASHLOOKUP.count > 1 THEN @NULL ELSE INLINK.Emp_Id

This really only works if you want all that have more than one Emp_Id to be nulled.

Please let me know if this helps or you need further information.

I tested this and it worked and I have used similar logic before.

Regards,