mismatch in join stage

bgs · Post by **bgs** » Sun May 13, 2007 12:29 am

I have two inputs to the join stage.
1. from a dataset which is hash partitioned on col A and sorted on col B.
2. from a transformer.

In the join stage I used partitioning type as "same" for the link coming from dataset and for the link from transformer I selected the hash partition on col A and sorted on col B.
The key column is col B. With these setting I am getting a mismatch in join. Could someone tell me if there is any mistake in my settings.

Thanks

ray.wurlod · Post by **ray.wurlod** » Sun May 13, 2007 4:12 pm

Take a look at the score. DataStage is probably inserting tsort operators that sort on the hash partitioning key. Add Sort stage set to "don't sort (previously sorted)" on the link from the Data Set.

nick.bond · Post by **nick.bond** » Sun May 13, 2007 4:45 pm

You need to partition by your key column.

Imagine you have this

DataSet
ColA ColB
1 x
2 y

Transformer
ColA ColB
1 y
2 x

If you partition by ColA it is likely that your matching records x=x and y=y will be on separate partitions so will not match.

bgs · Post by **bgs** » Mon May 14, 2007 9:01 am

hi nick,
value in colA is the last character of colB which will have value between 0-9,so all the records with same value should fall in same partition.
I tried repartitioning the data from the dataset and it worked.But I thought using partition type "same" should also work.

nick.bond · Post by **nick.bond** » Mon May 14, 2007 4:31 pm

value in colA is the last character of colB

didn't know that!

...then i would also expect 'Same' to work......

...are you running this job on the same number of nodes as the job that created the dataset?

..can you do as Ray suggested and take a look at the score? perhaps you could post it here...