Page 1 of 1

different outputs from difference stage

Posted: Fri Oct 02, 2009 9:43 am
by emma
I have 2 dataset inputs and a difference stage.
Every time I'm running the job it gives me another number of output rows.

The Difference stage input is partitioned on Hash type and sorted by keys.

What am I doing wrong?

Posted: Fri Oct 02, 2009 4:42 pm
by ray.wurlod
Are the data sorted?

Posted: Mon Jan 11, 2010 5:15 am
by mgendy
Check that the data is sorted and partritioned with all difference keys , use the proper partitioning method , hash partitioning is recommended if you have multiple difference keys

Posted: Mon Jan 11, 2010 4:08 pm
by ray.wurlod
Presumably the input Data Sets are not changing between runs?

Can you give a couple of examples of input and output row counts? For example:

Code: Select all

Run   Before   After   Output
 1      2342    2344      412
 2      2342    2344      414