Hash partitioning on the same subset key

adasgupta123 · Post by **adasgupta123** » Thu Sep 15, 2011 3:41 am

Hi ,

In my job two stages - remove duplicate and join stage are placed side by side.

I have done key based hash partitioning for the first stage (remove duplicate).The key for the first stage(remove duplicate) is columns A and B.For the next join stage the key is B.

My query is do I need to again repartition the data in join stage on column B or I can go with "same" partitioning in the join stage as data is already key partitioned in the previous stage on column A. B and B is subset of A,B ?

Thanks and Regards

Avik Dasgupta

BI-RMA · Post by **BI-RMA** » Thu Sep 15, 2011 4:35 am

Hi adasgupta123,

You could only use same partitioning if the second input-stream to your Join-Stage also contained column A and was also hash-partitioned by columns A and B. But then You could also keep the Join-key as A and B.

Since Your second stream probably does not have column A, You will have to repartition stream 1 to get identical values on column B into the same partitions for both streams.

adasgupta123 · Post by **adasgupta123** » Thu Sep 15, 2011 5:45 am

Hi Roland,

Thanks for your explanation .I got your point.

There is another similar scenario ,the only difference is the second stage is aggregator stage.That means the remove duplicate and aggregator stage are placed side by side .The key for the first stage is A,B columns and for second stage is B.The first stage is key partitioned on A,B.I think as the second stage (aggregator) is having single input link and there is no matching opearation like join ,we can go ahead with same partition for the second stage.Please correct me if I am wrong.Looking for your advice .

Thanking you

Avik

BI-RMA · Post by **BI-RMA** » Thu Sep 15, 2011 5:51 am

Hi Avik,

correct. In this case all values for identical values on Column B will be in the same partition without repartitioning.

adasgupta123 · Post by **adasgupta123** » Thu Sep 15, 2011 5:59 am

Hi Roland ,

Thanks a lot

Regards

Avik

DSXchange

Hash partitioning on the same subset key

Hash partitioning on the same subset key

Re: Hash partitioning on the same subset key