Issue removing duplicates

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
A_SUSHMA
Participant
Posts: 43
Joined: Fri Apr 12, 2013 10:34 am

Issue removing duplicates

Post by A_SUSHMA »

Hi All,


I have 2 input files. I want to compare two files and remove duplicates from the 2nd file and merge the files into single file.



input1

Empno | Ename
101 | ABC
102 | BBB


Input2

Empno
101
103

Output
Empno|Ename

101 | ABC
102 |BBB
103
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Post by ssnegi »

Use Join stage "Full Outer Join"
Left Link - File1 Right Link - File2
Join Key - Empno
Then in transformer use derivation :
field empno : If left.empno is null then right.empno else left.empno
A_SUSHMA
Participant
Posts: 43
Joined: Fri Apr 12, 2013 10:34 am

Post by A_SUSHMA »

Hi ssnegi,


I designed Job same like above but I am getting output is

Empno | Ename
101 | ABC
102 | BBB
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Post by ssnegi »

you used Left Outer Join which is the default. from the drop down list select Full Outer Join.
In the mappings drag left.empno and right.empno
A_SUSHMA
Participant
Posts: 43
Joined: Fri Apr 12, 2013 10:34 am

Post by A_SUSHMA »

Hi ssnegi,

I used Full Outer Join not Left Outer Join.

My Transformation condition is

If IsNull(dslink.left.empno) then dslink.right.empno else dslink.left.empno
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Post by ssnegi »

Code: Select all

If IsNull(dslink.left.empno) or dslink.left.empno = 0 then dslink.right.empno else dslink.left.empno
how many rows do you see exiting the join stage from performance statistics. It should be 3 as per the sample data.
If this doesn't work then take the output of left.empno and right.empno into peek stage. See if the join in putting zero or null
A_SUSHMA
Participant
Posts: 43
Joined: Fri Apr 12, 2013 10:34 am

Post by A_SUSHMA »

Joining is putting NULL.
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Post by ssnegi »

How many rows are exiting the join stage ?
A_SUSHMA
Participant
Posts: 43
Joined: Fri Apr 12, 2013 10:34 am

Post by A_SUSHMA »

3 rows (third row all columns are showing NULL)
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Post by ssnegi »

In the join stage use hash partitioning sort ascending on empno field on both input links.

Then in transformer use below derivation :

Code: Select all

If Trim(dslink.left.empno) = '' or IsNull(dslink.left.empno) or dslink.left.empno = 0 then dslink.right.empno else dslink.left.empno
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:!: Moved to the correct forum and changed the Subject to be more specific as everything here is a "DataStage Scenario".
-craig

"You can never have too many knives" -- Logan Nine Fingers
harishkumar.upadrasta
Participant
Posts: 18
Joined: Tue Dec 25, 2012 10:39 pm
Location: Detroit,MI

Post by harishkumar.upadrasta »

I think in your case you are getting Nulls for all the fields because of the wrong comparison used in the Derivation.

If the Empno Field is of Datatype "Integer" then the Transformer derivation should be compared to "0"

if dslnk.left.empno=0 then dslnk.right.empno else dslnk.left.empno

if the Datatype is Varchar you can use the below Derivation should be compared to ''(empty). In Most of the cases you will get '' not NULL as the value might be defaulted to Column default value.

if nulltoempty(dslnk.left.empno)='' then dslnk.right.empno else dslnk.left.empno
Harish
Post Reply