Difference stage working

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
akarsh
Participant
Posts: 51
Joined: Fri May 09, 2008 4:03 am
Location: Pune

Difference stage working

Post by akarsh »

Hi,
I have some doubt regarding working of difference stage in Px. In its manual it is given that the output of difference stage will have data from after dataset with the diff code.
But when I am creating a sample code for this the output is having data from the before data set. :shock:
both the before and after dataset have same column definition with same name.
Please help me regarding this.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Only unique column names can appear on the output. Change column names on at least one of the inputs. Also (in input link execution order) verify that Before and After are indeed the links you believe them to be.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
akarsh
Participant
Posts: 51
Joined: Fri May 09, 2008 4:03 am
Location: Pune

Post by akarsh »

ray.wurlod wrote:Only unique column names can appear on the output. Change column names on at least one of the inputs. Also (in input link execution order) verify that Before and After are indeed the links you belie ...

Thanks Ray for replying.. :)
I have checked that output of difference stage is having the data from the before dataset, but as i said it should have the data from after dataset( as mentioned in Px manual).i am confused now.. :?
andrewn
Premium Member
Premium Member
Posts: 14
Joined: Tue Jul 10, 2007 3:19 am
Location: UK

Post by andrewn »

I know this thread is a few months old but thought I'd post what I have found out, having come up against a very similar issue.

IBM support told me:
when using the diff operator values from duplicate field names including key and value fields are copied from the before data set only.
They also referred me to the documentation - which I had read, but came to a different conlusion about how the stage should work!

In essence, the key and value fields *must* have the same names in the before and after data.

That means if you have records in the after data which don't exist in the before data - i.e. inserts - what you actually get in the output data is a set of records with all the key and value fields set to their data type default.

It also means that any value fields always contain the data from the before record. So if you have output records identified as "edits" you will see all the value fields containing the before data.

It's working as designed according to IBM but it means the Difference stage is not very good at telling you the differences :roll:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Even so, the non-key columns can have different names, and would then be reported on the output link.
I guess this is why the documentation also states that the Difference stage is being phased out, and is not to be preferred.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply