different outputs from difference stage

emma · Post by **emma** » Fri Oct 02, 2009 9:43 am

I have 2 dataset inputs and a difference stage.
Every time I'm running the job it gives me another number of output rows.

The Difference stage input is partitioned on Hash type and sorted by keys.

What am I doing wrong?

ray.wurlod · Post by **ray.wurlod** » Fri Oct 02, 2009 4:42 pm

Are the data sorted?

mgendy · Post by **mgendy** » Mon Jan 11, 2010 5:15 am

Check that the data is sorted and partritioned with all difference keys , use the proper partitioning method , hash partitioning is recommended if you have multiple difference keys

ray.wurlod · Post by **ray.wurlod** » Mon Jan 11, 2010 4:08 pm

Presumably the input Data Sets are not changing between runs?

Can you give a couple of examples of input and output row counts? For example:

Code: Select all

Run   Before   After   Output
 1      2342    2344      412
 2      2342    2344      414