Join Stage - Paritionning question

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
tardifma
Premium Member
Premium Member
Posts: 23
Joined: Tue Jan 24, 2006 10:53 am

Join Stage - Paritionning question

Post by tardifma »

Hi.

I have a question about jobs that contains multiple join stages and their partition...

Actually, I have a PX job that contains 2 join stages (left outer join). I've set the partitionning on both stages to Hash...

The first Join has 5 keys... Key1,Key2,Key3,Key4,Key5 on the two input links. I used these keys for partitionning as well.

The second has 4 keys... the keys of the partition is a subset of the first join... Key1,Key2,Key3,Key4 (these keys are used for both, partitionning and key of the join).

It seems that if I do not check the option Perform sort on the second join stage... it does not find the corresponding record in the right part of the join (but it is there and the key definition is good)...

In other jobs, I use to have more than one join stage... but they don't have the same keys... So I didn't need to check this option on... and it always worked fine...

This is weird... I tough by partitionning correctly the join stage, it should always work fine??? Is this a Datastage bug???

Thanks.
Math
The Brute
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

No it is not a bug.

You do not need to partition by all 5 keys in the first job (but they must be sorted by all 5). The aim is to partition as infrequently as possible to meet the needs of each stage and get a good spread of data across nodes.

Try partitioning by fields 1 to 4 in the first join and sorting by all 5 fields. Set partitioning to SAME in the 2nd join.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Say you have four nodes. And the data for the first join be,

key1 key2 key3 key4 key5
w x y z 1
w x y z 2
w x y z 3
w x y z 4

If you make hash partition on all the 5 keys, each record will go into each partion (most probably). If you make hash partiton on first four records all the above four recod will go into a single partion.
And that is the reason you dont find the match for four records execpt one, in the second join (if the subset of keys is chosed).
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply