Merge Duplicate Records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Ashish
Participant
Posts: 57
Joined: Tue Jan 31, 2006 1:16 am

Merge Duplicate Records

Post by Ashish »

I got 2 files from source

Structure of File1 is
Col1 Col2 Col3
1 RS A
1 RD B
1 GD C
Structure of File2 is
Col1 Col5
1 TB
1 TB

By using above two files I have to create output file like
Output file structure
Col1 Col2 Col3 Col5
1 RS A TB
1 RD B TB
1 GD C

Can any one help me how to create output file Like above structure

Cheers,
A
BugFree
Participant
Posts: 82
Joined: Wed Dec 13, 2006 6:02 am

Post by BugFree »

hi,

This is the left outer join logic.

keep file1 as left link data, File2 as right link data for the Join/Lookup stage.
Mapp Col1 Col2 Col3 Col5 to target and you will get the result :) .
Ping me if I am wrong...
mahadev.v
Participant
Posts: 111
Joined: Tue May 06, 2008 5:29 am
Location: Bangalore

Post by mahadev.v »

BugFree, You would get the result but it will be 6 records instead of 3 :wink: . Ashish, if you are sure that the records are already in the required order then you can generate a surrogate key for each of the links using a Row generator stage and then join the data on this field.
BugFree
Participant
Posts: 82
Joined: Wed Dec 13, 2006 6:02 am

Post by BugFree »

Yes Mahadev you are right.. :D . we need to have unique value for each row for both the files.
Ping me if I am wrong...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can you not add a Remove Duplicates stage on the Right input?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ashish
Participant
Posts: 57
Joined: Tue Jan 31, 2006 1:16 am

Post by Ashish »

No Ray we can't add RDUP stage on right side,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Why not?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
laconic
Participant
Posts: 1
Joined: Sat Feb 23, 2008 2:29 am

Can be done without Remove Duplicate

Post by laconic »

This can be done using Merge stage and without removing Duplicate-
File1
Col1 Col2 Col3
1 RS A
1 RD B
1 GD C

File2
Col1 Col5
1 TB
1 TB

Use Merge stage with File2 on Master link and File1 on Update link. Set "Unmatched master mode" as "Drop". Key column - Col1.

Output
Col1 Col2 Col3 Col5
1 RS A TB
1 RD B TB
1 GD C TB
Post Reply