Page 1 of 1
Issue removing duplicates
Posted: Tue May 06, 2014 1:30 am
by A_SUSHMA
Hi All,
I have 2 input files. I want to compare two files and remove duplicates from the 2nd file and merge the files into single file.
input1
Empno | Ename
101 | ABC
102 | BBB
Input2
Empno
101
103
Output
Empno|Ename
101 | ABC
102 |BBB
103
Posted: Tue May 06, 2014 1:46 am
by ssnegi
Use Join stage "Full Outer Join"
Left Link - File1 Right Link - File2
Join Key - Empno
Then in transformer use derivation :
field empno : If left.empno is null then right.empno else left.empno
Posted: Tue May 06, 2014 2:45 am
by A_SUSHMA
Hi ssnegi,
I designed Job same like above but I am getting output is
Empno | Ename
101 | ABC
102 | BBB
Posted: Tue May 06, 2014 2:53 am
by ssnegi
you used Left Outer Join which is the default. from the drop down list select Full Outer Join.
In the mappings drag left.empno and right.empno
Posted: Tue May 06, 2014 3:04 am
by A_SUSHMA
Hi ssnegi,
I used Full Outer Join not Left Outer Join.
My Transformation condition is
If IsNull(dslink.left.empno) then dslink.right.empno else dslink.left.empno
Posted: Tue May 06, 2014 3:18 am
by ssnegi
Code: Select all
If IsNull(dslink.left.empno) or dslink.left.empno = 0 then dslink.right.empno else dslink.left.empno
how many rows do you see exiting the join stage from performance statistics. It should be 3 as per the sample data.
If this doesn't work then take the output of left.empno and right.empno into peek stage. See if the join in putting zero or null
Posted: Tue May 06, 2014 3:26 am
by A_SUSHMA
Joining is putting NULL.
Posted: Tue May 06, 2014 3:28 am
by ssnegi
How many rows are exiting the join stage ?
Posted: Tue May 06, 2014 3:37 am
by A_SUSHMA
3 rows (third row all columns are showing NULL)
Posted: Tue May 06, 2014 3:40 am
by ssnegi
In the join stage use hash partitioning sort ascending on empno field on both input links.
Then in transformer use below derivation :
Code: Select all
If Trim(dslink.left.empno) = '' or IsNull(dslink.left.empno) or dslink.left.empno = 0 then dslink.right.empno else dslink.left.empno
Posted: Tue May 06, 2014 7:17 am
by chulett
![Exclamation :!:](./images/smilies/icon_exclaim.gif)
Moved to the correct forum and changed the Subject to be more specific as everything here is a "DataStage Scenario".
Posted: Tue May 06, 2014 2:43 pm
by harishkumar.upadrasta
I think in your case you are getting Nulls for all the fields because of the wrong comparison used in the Derivation.
If the Empno Field is of Datatype "Integer" then the Transformer derivation should be compared to "0"
if dslnk.left.empno=0 then dslnk.right.empno else dslnk.left.empno
if the Datatype is Varchar you can use the below Derivation should be compared to ''(empty). In Most of the cases you will get '' not NULL as the value might be defaulted to Column default value.
if nulltoempty(dslnk.left.empno)='' then dslnk.right.empno else dslnk.left.empno