Question on Partitioning before join stage
Moderators: chulett, rschirm, roy
Question on Partitioning before join stage
Hi All ,
I have some basic question about join stage . The design of the job is as below
Input1:comes from oracle enterprise stage select col1 , col2 , col3 , col4 from table
Input2: dataset ( dataset is generated in previous job with design Oracle enterprise ---> Sort Col1 , Col2 --> Dataset )
Input1 ---> Sort stage ( sort on Key columns ) --> Left link of join stage
Input2 ---> Sort Stage ( Don't sort previous sorted on key columns ) ---> Right link of join stage
Now My question is where should I do hash partitioning on key columns ? Is it on the input links of Sort stages or input links of join stage ?
Please suggest .
Thanks
I have some basic question about join stage . The design of the job is as below
Input1:comes from oracle enterprise stage select col1 , col2 , col3 , col4 from table
Input2: dataset ( dataset is generated in previous job with design Oracle enterprise ---> Sort Col1 , Col2 --> Dataset )
Input1 ---> Sort stage ( sort on Key columns ) --> Left link of join stage
Input2 ---> Sort Stage ( Don't sort previous sorted on key columns ) ---> Right link of join stage
Now My question is where should I do hash partitioning on key columns ? Is it on the input links of Sort stages or input links of join stage ?
Please suggest .
Thanks
-
- Premium Member
- Posts: 187
- Joined: Thu Apr 14, 2011 5:10 pm
Re: Question on Partitioning before join stage
Hi,
Is this is what your are looking for ???Can you please quote your job design in "code" with input and out put links
If job design is clear you can expect more responses
Thank you
Code: Select all
Left Link (sort input1),right link (sort input2)>>>>join stage >>>Output (???)
If job design is clear you can expect more responses
Thank you
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Code: Select all
TABLE -> Sort ---->Join ->o/p
^
||
DATASET --> Don't sort previously sorted
Here I am doing partitioning first ( before sort stages ) & keeping same partitioning in join stage ..Just wanted to know if it is correct
What I used to do is Hash partitioning on Sort stage and same partitioning on join stage input links. Partitioning key you can give same as joining key. So you will get matching data in one partition.
Thanks & Regards
Jose Johny
Project Engineer
Wipro Technologies |Bangalore
"Life is a process of cultivating goodness & removing evilness"
Jose Johny
Project Engineer
Wipro Technologies |Bangalore
"Life is a process of cultivating goodness & removing evilness"
At least in front of the Sort stage for Input 1, if not earlier as suggested by Ray as a best practice (in your job that may be the earliest location you can). Because Input 2 is already sorted, it should have also been partitioned at that time.
Regards,
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
Repartitioning a dataset invalidates the sort order (it is no longer guaranteed) and potentially disrupts the order of the data in the partitions of a dataset. You should properly partition the data when you sort it, on the input to or prior to the sort stage.
In the job which creates your input2 dataset, if you haven't disabled partition insertion, more than likely the data has been partitioned automatically by DataStage when it is sorted unless your job design dictates otherwise. My personal preference and recommendation is to specify the partitioning so that you know it has been done to properly meet the needs of the downstream processing logic.
Regards,
In the job which creates your input2 dataset, if you haven't disabled partition insertion, more than likely the data has been partitioned automatically by DataStage when it is sorted unless your job design dictates otherwise. My personal preference and recommendation is to specify the partitioning so that you know it has been done to properly meet the needs of the downstream processing logic.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.