![Question :?:](./images/smilies/icon_question.gif)
sorted input to Join
Moderators: chulett, rschirm, roy
sorted input to Join
Must Join stage have sorted inputs (all or any one) ![Question :?:](./images/smilies/icon_question.gif)
![Question :?:](./images/smilies/icon_question.gif)
Hi,
To quote the help:
To quote the help:
The data sets input to the Join stage must be key partitioned and sorted.
Last edited by roy on Wed Feb 08, 2006 10:01 am, edited 1 time in total.
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
![Image](http://www.worldcommunitygrid.org/images/logo.gif)
Re: sorted input to Join
It is recommended to sort before join so that the join will be more efficient. If you're sorting than all sources must be sorts the same way before join.djoni wrote:Must Join stage have sorted inputs (all or any one)
It is even better to sort using the RDBMs if the join is already indexed in the RDBMs so that you can save processing time & resources in DataStage Server.
Re: sorted input to Join
Recommended or Mandatory?felixyong wrote:It is recommended to sort before join so that the join will be more efficient. If you're sorting than all sources must be sorts the same way before join.djoni wrote:Must Join stage have sorted inputs (all or any one)
It is even better to sort using the RDBMs if the join is already indexed in the RDBMs so that you can save processing time & resources in DataStage Server.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Mandatory if the manual is to be believed. I believe it so have always key partitioned and sorted Join stage inputs. Perhaps you'd like to try without, and let us know the result?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Runs well on two un-sorted sequential files, auto partitioned no sort.ray.wurlod wrote:Mandatory if the manual is to be believed. I believe it so have always key partitioned and sorted Join stage inputs. Perhaps you'd like to try without, and let us know the result?
So, is something wrong with the manual and .... EE Essential course?
-
- Premium Member
- Posts: 62
- Joined: Tue Sep 21, 2004 10:24 am
- Location: IBM - Chicago Area
djoni wrote:Runs well on two un-sorted sequential files, auto partitioned no sort.ray.wurlod wrote:Mandatory if the manual is to be believed. I believe it so have always key partitioned and sorted Join stage inputs. Perhaps you'd like to try without, and let us know the result?
So, is something wrong with the manual and .... EE Essential course?
See the problem occurs when you are running the job on multiple nodes and with large amount of data because if at that time you didnt do the hash partitioning on key then two records with same key value can go in different partitions and join will not take place. We have faced this type of issue in one of our projects.
As far as sort is concerned it is basically to improve the performance. So partitioning is mandatory but sort is preferable.
Regards
dsusr