mismatch in join stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bgs
Participant
Posts: 22
Joined: Sat Feb 05, 2005 9:43 pm

mismatch in join stage

Post by bgs »

I have two inputs to the join stage.
1. from a dataset which is hash partitioned on col A and sorted on col B.
2. from a transformer.

In the join stage I used partitioning type as "same" for the link coming from dataset and for the link from transformer I selected the hash partition on col A and sorted on col B.
The key column is col B. With these setting I am getting a mismatch in join. Could someone tell me if there is any mistake in my settings.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Take a look at the score. DataStage is probably inserting tsort operators that sort on the hash partitioning key. Add Sort stage set to "don't sort (previously sorted)" on the link from the Data Set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

You need to partition by your key column.

Imagine you have this

DataSet
ColA ColB
1 x
2 y

Transformer
ColA ColB
1 y
2 x

If you partition by ColA it is likely that your matching records x=x and y=y will be on separate partitions so will not match.
Regards,

Nick.
bgs
Participant
Posts: 22
Joined: Sat Feb 05, 2005 9:43 pm

Post by bgs »

hi nick,
value in colA is the last character of colB which will have value between 0-9,so all the records with same value should fall in same partition.
I tried repartitioning the data from the dataset and it worked.But I thought using partition type "same" should also work.
nick.bond
Charter Member
Charter Member
Posts: 230
Joined: Thu Jan 15, 2004 12:00 pm
Location: London

Post by nick.bond »

value in colA is the last character of colB
didn't know that!

...then i would also expect 'Same' to work......

...are you running this job on the same number of nodes as the job that created the dataset?

..can you do as Ray suggested and take a look at the score? perhaps you could post it here...
Regards,

Nick.
Post Reply