Hi,
I have 5 datasets which are created in some jobs, all these datasets are hash partioned on one key column as A. There is one job where I join all these datasets by using one join. out of these 5 datasets , I perform a left outer on one dataset. Rest 4 are right outer. The key that is used for joining is A and there is one more key as B. So I use 2 keys to join these 5 datasets. As the dataset is already partinoned, I am using same partition on all the links because I will have all the values for B in the same partition as A. However, this dosent seem to be working fine. Though I have matching records in all the datasets, I am still not getting any records from the reference and just the records from left outer join are getting out. I tried removing all the refences links and just kept one, that is main dataset and reference data set with join and it seemes to work fine. As soon as I added one more reference data set, the partition is not working correctly.
I did re partition on all the datasets (hash) and it was working fine. I am confused here!!Though the dataset is partitoned so I need to re partition the datasets again? or should I use seprate join for each reference dataset? In such case I will end up suing 5 joins.Please suggest!!!
same partioning not working correctly in joiner stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 16
- Joined: Wed Apr 07, 2010 10:44 pm
same partioning not working correctly in joiner stage
Radhika Sharma
If you turn on APT_DUMP_SCORE and look at the generated scores do you see any repartitioning being performed?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 16
- Joined: Wed Apr 07, 2010 10:44 pm
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 730
- Joined: Tue Nov 04, 2008 10:14 am
- Location: Bangalore
Re: same partioning not working correctly in joiner stage
1) Is the config file used in the job which created these datasets and the one used in the job which joins, the same ?
2) As suggested look into the dump score
3) Apart from hashing hope you are sorting as well, if not use same partitioning and sort using in line sort.
2) As suggested look into the dump score
3) Apart from hashing hope you are sorting as well, if not use same partitioning and sort using in line sort.