Hash partitioning on the same subset key

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
adasgupta123
Participant
Posts: 42
Joined: Fri Oct 20, 2006 1:58 am

Hash partitioning on the same subset key

Post by adasgupta123 »

Hi ,

In my job two stages - remove duplicate and join stage are placed side by side.

I have done key based hash partitioning for the first stage (remove duplicate).The key for the first stage(remove duplicate) is columns A and B.For the next join stage the key is B.

My query is do I need to again repartition the data in join stage on column B or I can go with "same" partitioning in the join stage as data is already key partitioned in the previous stage on column A. B and B is subset of A,B ?

Thanks and Regards

Avik Dasgupta
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Re: Hash partitioning on the same subset key

Post by BI-RMA »

Hi adasgupta123,

You could only use same partitioning if the second input-stream to your Join-Stage also contained column A and was also hash-partitioned by columns A and B. But then You could also keep the Join-key as A and B.

Since Your second stream probably does not have column A, You will have to repartition stream 1 to get identical values on column B into the same partitions for both streams.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
adasgupta123
Participant
Posts: 42
Joined: Fri Oct 20, 2006 1:58 am

Post by adasgupta123 »

Hi Roland,

Thanks for your explanation .I got your point.

There is another similar scenario ,the only difference is the second stage is aggregator stage.That means the remove duplicate and aggregator stage are placed side by side .The key for the first stage is A,B columns and for second stage is B.The first stage is key partitioned on A,B.I think as the second stage (aggregator) is having single input link and there is no matching opearation like join ,we can go ahead with same partition for the second stage.Please correct me if I am wrong.Looking for your advice .

Thanking you

Avik
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Hi Avik,

correct. In this case all values for identical values on Column B will be in the same partition without repartitioning.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
adasgupta123
Participant
Posts: 42
Joined: Fri Oct 20, 2006 1:58 am

Post by adasgupta123 »

Hi Roland ,

Thanks a lot

Regards

Avik
Post Reply