Page 1 of 1

Issue removing duplicates

Posted: Tue May 06, 2014 1:30 am
by A_SUSHMA
Hi All,


I have 2 input files. I want to compare two files and remove duplicates from the 2nd file and merge the files into single file.



input1

Empno | Ename
101 | ABC
102 | BBB


Input2

Empno
101
103

Output
Empno|Ename

101 | ABC
102 |BBB
103

Posted: Tue May 06, 2014 1:46 am
by ssnegi
Use Join stage "Full Outer Join"
Left Link - File1 Right Link - File2
Join Key - Empno
Then in transformer use derivation :
field empno : If left.empno is null then right.empno else left.empno

Posted: Tue May 06, 2014 2:45 am
by A_SUSHMA
Hi ssnegi,


I designed Job same like above but I am getting output is

Empno | Ename
101 | ABC
102 | BBB

Posted: Tue May 06, 2014 2:53 am
by ssnegi
you used Left Outer Join which is the default. from the drop down list select Full Outer Join.
In the mappings drag left.empno and right.empno

Posted: Tue May 06, 2014 3:04 am
by A_SUSHMA
Hi ssnegi,

I used Full Outer Join not Left Outer Join.

My Transformation condition is

If IsNull(dslink.left.empno) then dslink.right.empno else dslink.left.empno

Posted: Tue May 06, 2014 3:18 am
by ssnegi

Code: Select all

If IsNull(dslink.left.empno) or dslink.left.empno = 0 then dslink.right.empno else dslink.left.empno
how many rows do you see exiting the join stage from performance statistics. It should be 3 as per the sample data.
If this doesn't work then take the output of left.empno and right.empno into peek stage. See if the join in putting zero or null

Posted: Tue May 06, 2014 3:26 am
by A_SUSHMA
Joining is putting NULL.

Posted: Tue May 06, 2014 3:28 am
by ssnegi
How many rows are exiting the join stage ?

Posted: Tue May 06, 2014 3:37 am
by A_SUSHMA
3 rows (third row all columns are showing NULL)

Posted: Tue May 06, 2014 3:40 am
by ssnegi
In the join stage use hash partitioning sort ascending on empno field on both input links.

Then in transformer use below derivation :

Code: Select all

If Trim(dslink.left.empno) = '' or IsNull(dslink.left.empno) or dslink.left.empno = 0 then dslink.right.empno else dslink.left.empno

Posted: Tue May 06, 2014 7:17 am
by chulett
:!: Moved to the correct forum and changed the Subject to be more specific as everything here is a "DataStage Scenario".

Posted: Tue May 06, 2014 2:43 pm
by harishkumar.upadrasta
I think in your case you are getting Nulls for all the fields because of the wrong comparison used in the Derivation.

If the Empno Field is of Datatype "Integer" then the Transformer derivation should be compared to "0"

if dslnk.left.empno=0 then dslnk.right.empno else dslnk.left.empno

if the Datatype is Varchar you can use the below Derivation should be compared to ''(empty). In Most of the cases you will get '' not NULL as the value might be defaulted to Column default value.

if nulltoempty(dslnk.left.empno)='' then dslnk.right.empno else dslnk.left.empno