Page 1 of 1

Job design - compare rows between datasets

Posted: Wed Jul 26, 2017 5:08 am
by vgundavarapu
Hi
I need a feedback on job design
Where I have a requirement to compare two
Rows between two datasets and return all rows from one dataset
indicating which one matched.

Code: Select all

. Example
Datasets A
Id code 
1.   1640. 
2.   1427.  

Datasets B

I'd.   First code. Seccode. Third code fourthcode
1.        1427.        8200.     8000.        50000
2.         50000.       1640.      6000.       80000
I want all the rows from dataset B indicating to which column it matched

I used the join but it generates 4 times records and I have to use remove dups

I want to use compare or difference stage which one you recommand

Thks

Posted: Thu Jul 27, 2017 4:24 am
by ray.wurlod
Did you partition your data using Hash or Modulus based on Id value? Did you sort the data based on Id value?