same partioning not working correctly in joiner stage

radhika7983 · Post by **radhika7983** » Mon Nov 22, 2010 7:34 am

Hi,

I have 5 datasets which are created in some jobs, all these datasets are hash partioned on one key column as A. There is one job where I join all these datasets by using one join. out of these 5 datasets , I perform a left outer on one dataset. Rest 4 are right outer. The key that is used for joining is A and there is one more key as B. So I use 2 keys to join these 5 datasets. As the dataset is already partinoned, I am using same partition on all the links because I will have all the values for B in the same partition as A. However, this dosent seem to be working fine. Though I have matching records in all the datasets, I am still not getting any records from the reference and just the records from left outer join are getting out. I tried removing all the refences links and just kept one, that is main dataset and reference data set with join and it seemes to work fine. As soon as I added one more reference data set, the partition is not working correctly.
I did re partition on all the datasets (hash) and it was working fine. I am confused here!!Though the dataset is partitoned so I need to re partition the datasets again? or should I use seprate join for each reference dataset? In such case I will end up suing 5 joins.Please suggest!!!

ArndW · Post by **ArndW** » Mon Nov 22, 2010 8:39 am

If you turn on APT_DUMP_SCORE and look at the generated scores do you see any repartitioning being performed?

radhika7983 · Post by **radhika7983** » Mon Nov 22, 2010 9:36 pm

ArndW wrote:If you turn on APT_DUMP_SCORE and look at the generated scores do you see any repartitioning being performed? ...

Havent tried this. Will look into this. But do you thing that this will work?

ray.wurlod · Post by **ray.wurlod** » Mon Nov 22, 2010 9:54 pm

No, it's to help you to diagnose where the problem might lie.

zulfi123786 · Post by **zulfi123786** » Mon Nov 22, 2010 10:23 pm

1) Is the config file used in the job which created these datasets and the one used in the job which joins, the same ?

2) As suggested look into the dump score

3) Apart from hashing hope you are sorting as well, if not use same partitioning and sort using in line sort.

DSXchange

same partioning not working correctly in joiner stage

same partioning not working correctly in joiner stage

Re: same partioning not working correctly in joiner stage