Remove Duplicates -Rejected Records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Ush
Participant
Posts: 55
Joined: Tue Dec 04, 2007 3:15 am

Remove Duplicates -Rejected Records

Post by Ush »

Hi

I have the following set of records:

Empno name
1 Ash
1 Ush
2 Reeta
3 x
4 Y

I have retain first record and capture rejected records...I cant use remove duplicates since it does not have reject link.

Please help
Nripendra Chand
Premium Member
Premium Member
Posts: 196
Joined: Tue Nov 23, 2004 11:50 pm
Location: Sydney (Australia)

Post by Nripendra Chand »

you can use stage variables in transformer stage to get this result. make sure that records are hash partitioned and sorted on the required keys before stage variable logic.
-Nripendra Chand
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Sort the data on your key field, then use a transform stage that stores the last record key value in a stage variable and use the stage constraints to output records accordingly
Nirmala84
Participant
Posts: 3
Joined: Thu Jan 10, 2008 7:02 am

Re: Remove Duplicates -Rejected Records

Post by Nirmala84 »

[quote="Ush"]Hi

I have the following set of records:

Empno name
1 Ash
1 Ush
2 Reeta
3 x
4 Y

I have retain first record and capture rejected records...I cant use remove duplicates since it does not have reject link.

Please help[/quote]


Also,

Help me in fetching only the Unique records out of the remove duplicate stage.

For eg,

If I have the following set of records:

Empno name
1 Ash
1 Ush
2 Reeta
3 x
4 Y

The result set out of remove duplicates should contain the following set of records:

Empno
1
2
3
4

Please help.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Nirmala84 - the same method of using stage variables applies.
ccatania
Premium Member
Premium Member
Posts: 68
Joined: Thu Sep 08, 2005 5:42 am
Location: Raleigh
Contact:

Post by ccatania »

Remove duplicate stage can retain first or last duplicate record.
r_arora
Participant
Posts: 20
Joined: Tue Mar 04, 2008 10:30 am

Post by r_arora »

here is a suggestion:
Use a sort stage..sort on EmpNo and make the clusterKeyChange value "True". Then put a constraint on your transformer where all records having clusterKeyChangeValue 1 should go in one dataset and the other to the other dataset. You will get 2 datasets..one with all unique employee nos and the other having all the duplicate records.
Post Reply