Difference stage working

akarsh · Post by **akarsh** » Mon Dec 22, 2008 7:08 am

Hi,
I have some doubt regarding working of difference stage in Px. In its manual it is given that the output of difference stage will have data from after dataset with the diff code.
But when I am creating a sample code for this the output is having data from the before data set.

both the before and after dataset have same column definition with same name.
Please help me regarding this.

ray.wurlod · Post by **ray.wurlod** » Mon Dec 22, 2008 2:09 pm

Only unique column names can appear on the output. Change column names on at least one of the inputs. Also (in input link execution order) verify that Before and After are indeed the links you believe them to be.

akarsh · Post by **akarsh** » Mon Dec 22, 2008 11:45 pm

ray.wurlod wrote:Only unique column names can appear on the output. Change column names on at least one of the inputs. Also (in input link execution order) verify that Before and After are indeed the links you belie ...

Thanks Ray for replying..

I have checked that output of difference stage is having the data from the before dataset, but as i said it should have the data from after dataset( as mentioned in Px manual).i am confused now..

andrewn · Post by **andrewn** » Mon Apr 20, 2009 5:46 am

I know this thread is a few months old but thought I'd post what I have found out, having come up against a very similar issue.

IBM support told me:

when using the diff operator values from duplicate field names including key and value fields are copied from the before data set only.

They also referred me to the documentation - which I had read, but came to a different conlusion about how the stage should work!

In essence, the key and value fields *must* have the same names in the before and after data.

That means if you have records in the after data which don't exist in the before data - i.e. inserts - what you actually get in the output data is a set of records with all the key and value fields set to their data type default.

It also means that any value fields always contain the data from the before record. So if you have output records identified as "edits" you will see all the value fields containing the before data.

It's working as designed according to IBM but it means the Difference stage is not very good at telling you the differences

ray.wurlod · Post by **ray.wurlod** » Mon Apr 20, 2009 3:39 pm

Even so, the non-key columns can have different names, and would then be reported on the output link.
I guess this is why the documentation also states that the Difference stage is being phased out, and is not to be preferred.