Hello friends,
I am trying to process multiple rows in the 'Transformer' stage at a time and insert rows in the target file based on the result.
For example if the data in the input file is:
Cust_Name-------Emp_Id
------------------------------
A------------123
B------------123
C------------123
A------------456
B------------789
A------------435
B------------435
C------------435
==============================================
If the EmP-Id is same for the customers then I want to insert 'Null' in the Emp_Id column in the target file.
My intension is to pupulate the target file as:
Cust_Name-------Emp_Id
------------------------------
A------------NULL
B------------NULL
C------------NULL
A------------456
B------------789
A------------NULL
B------------NULL
C------------NULL
===========================================
Can anyone help me with this
Thanks a lot in advance
-Yamini
Process multiple rows at a time in 'Transformer' stage
Moderators: chulett, rschirm, roy
Since the paradigm for DataStage is to process one row at a time your situation presents a moderate challenge. You can probably accomplish this several ways, but if you want to use a job and your source is sequential then you might consider using the following -
The top stream or Seq_Master to aggregator to Hash will process first then the main stream will process. Note that Seq_Master is the same file. It is used to build your Hash and is also input to the main stream.
In your aggregator you will want to count Cust_Name and group by Emp_Id. Write this to the Hash with the key to the Hash being Emp_Id. Note that you use the same hash file as both input and output and you can change the column name for Cust_Name to cnt or count or something else to be more meaningful for the purposes of what you are doing. Basically we are getting a count for each group of Emp_Id's so you can use Cust_Name, but at this point it really becomes a count.
Perform a lookup in the main stream against your hash file based on Emp_Id. The derivation on the right side for Emp_Id would look like -
This really only works if you want all that have more than one Emp_Id to be nulled.
Please let me know if this helps or you need further information.
I tested this and it worked and I have used similar logic before.
Regards,
Code: Select all
Seq_Master ------->Aggregator-------->Hash
|
v
Seq_Master-------->Transform------------>Seq
In your aggregator you will want to count Cust_Name and group by Emp_Id. Write this to the Hash with the key to the Hash being Emp_Id. Note that you use the same hash file as both input and output and you can change the column name for Cust_Name to cnt or count or something else to be more meaningful for the purposes of what you are doing. Basically we are getting a count for each group of Emp_Id's so you can use Cust_Name, but at this point it really becomes a count.
Perform a lookup in the main stream against your hash file based on Emp_Id. The derivation on the right side for Emp_Id would look like -
Code: Select all
IF HASHLOOKUP.count > 1 THEN @NULL ELSE INLINK.Emp_Id
Please let me know if this helps or you need further information.
I tested this and it worked and I have used similar logic before.
Regards,
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com