Route one of Duplicates based on condition

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sarathi
Participant
Posts: 27
Joined: Thu Feb 11, 2010 4:14 pm
Location: Minneapolis

Route one of Duplicates based on condition

Post by sarathi »

I am having DTS issue where i need to Route one of Duplicate value to different file and then remove duplicates. Here is an examples of how data looks for columns Code & Status

Case 1
Code Status
1DKSSK2 Removed
1DKSSK2 Removed
1DKSSK2 Removed

Case 2
Code Status
1DKSSK1 Implemented
1DKSSK1 Removed
1DKSSK1 Removed

Case 1 - I have to fetch first row with Status =Removed to different file and eliminate other two rows
Case 2 - Fetch first row with Status=Implemented to different file and eliminate other rows

Sorted data on Code/Status and can i use SWITCH stage to Extract first row and then 'Filter Stage' to Eliminate rows where Status <> Removed.

Will this work ? Any better way to do it

Thanks in advance

Sarathi
sarathi
Participant
Posts: 27
Joined: Thu Feb 11, 2010 4:14 pm
Location: Minneapolis

Post by sarathi »

Do anyone had task to do this type of Logic and found solution for same.
vinodn
Charter Member
Charter Member
Posts: 93
Joined: Tue Dec 13, 2005 11:00 am

Post by vinodn »

In the transformer separate two types of records using constraint
1) Status = Removed
2) Status = Implemented

Then use Remove Duplicate Stage and pull either first/last record
vinodn
Charter Member
Charter Member
Posts: 93
Joined: Tue Dec 13, 2005 11:00 am

Post by vinodn »

Have a lookup on the first set where STATUS='Removed' with the second set, if you find any record then PASS this records to an Sequential Stage which would be your STATUS=IMPLEMENTED records and wherever you won't find PASS them to a Reject Link which would be your STATUS=REMOVED records

Transformer -> Removed ->Rem_dup
->Implemented ->Rem_dup ->Lkp ->IMPLEMENTED
|
REMOVED
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You could also look into the Sort stage with the Add Key Change Column option enabled to see if that helped simplify the identification of your 'duplicates'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sarathi
Participant
Posts: 27
Joined: Thu Feb 11, 2010 4:14 pm
Location: Minneapolis

Post by sarathi »

Thanks for reply. I have only one query which pulls data and you mentioned to go for lookup below. Does it mean LOOKUP Stage ?
As i don't have Two Sources prefer Transformer Stage.

- If Rem_status='Removed' and rem_status=imp_status Then Pass this record to seq stage else Reject record.
Let me know above logic is corrected.

Have a lookup on the first set where STATUS='Removed' with the second set, if you find any record then PASS this records to an Sequential Stage which would be your STATUS=IMPLEMENTED records and wherever you won't find PASS them to a Reject Link which would be your STATUS=REMOVED records
rohithmuthyala
Participant
Posts: 57
Joined: Wed Oct 21, 2009 4:46 am
Location: India

Post by rohithmuthyala »

In transformer itself, one can compare the code value with that of the previous code value using stage variables. If same then you can directly drop it, else send the record to the link.
Rohith
ReachKumar
Participant
Posts: 29
Joined: Wed Jan 06, 2010 7:18 am

Post by ReachKumar »

To compare and drop the remaining records in transformer, the records should be sorted.

If the data is sorted already, then i think you can also use Remove duplicates Stage to retain first record based on key column(First column in your case).
Regards,
Kumar
sarathi
Participant
Posts: 27
Joined: Thu Feb 11, 2010 4:14 pm
Location: Minneapolis

Post by sarathi »

Below is method i followed and did not work. Am i making any mistakes here ?

1- Sorted data on Key and Status

2- In the transformer created separate two types of records using constraint
1) Status = Removed
2) Status = Implemented
3 - Duplicate state and pulled only Last record

4. Created STAGE VARIABLE in NEXT Transformer Stage
If (Impl_STATUS <> Rem_STATUS) Then Impl_STATUS
and linked to Impl_Status

5. Linked one output to Sequential and another to my old Lookup

Sarathi
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

Are these the only possible values of STATUS?

If so use a sort stage (on CODE) and generate a key change column. The data must be partitioned on CODE.

You will need two output links with constraints to meet your described logic.

The keyChange column has a value of 1 for the first occurence of a code, then the rest are 0.

Constraint 1: keyChange = 1 AND STATUS = "Removed"
Constraint 2: keyChange= 1 AND STATUS = "Implemented"
Post Reply