Join Stage - Paritionning question

tardifma · Post by **tardifma** » Thu Jun 08, 2006 5:51 am

Hi.

I have a question about jobs that contains multiple join stages and their partition...

Actually, I have a PX job that contains 2 join stages (left outer join). I've set the partitionning on both stages to Hash...

The first Join has 5 keys... Key1,Key2,Key3,Key4,Key5 on the two input links. I used these keys for partitionning as well.

The second has 4 keys... the keys of the partition is a subset of the first join... Key1,Key2,Key3,Key4 (these keys are used for both, partitionning and key of the join).

It seems that if I do not check the option Perform sort on the second join stage... it does not find the corresponding record in the right part of the join (but it is there and the key definition is good)...

In other jobs, I use to have more than one join stage... but they don't have the same keys... So I didn't need to check this option on... and it always worked fine...

This is weird... I tough by partitionning correctly the join stage, it should always work fine??? Is this a Datastage bug???

Thanks.
Math

thompsonp · Post by **thompsonp** » Thu Jun 08, 2006 6:01 am

No it is not a bug.

You do not need to partition by all 5 keys in the first job (but they must be sorted by all 5). The aim is to partition as infrequently as possible to meet the needs of each stage and get a good spread of data across nodes.

Try partitioning by fields 1 to 4 in the first join and sorting by all 5 fields. Set partitioning to SAME in the 2nd join.

kumar_s · Post by **kumar_s** » Thu Jun 08, 2006 8:10 am

Say you have four nodes. And the data for the first join be,

key1 key2 key3 key4 key5
w x y z 1
w x y z 2
w x y z 3
w x y z 4

If you make hash partition on all the 5 keys, each record will go into each partion (most probably). If you make hash partiton on first four records all the above four recod will go into a single partion.
And that is the reason you dont find the match for four records execpt one, in the second join (if the subset of keys is chosed).