Hi All,
To use a join stage the data should be Hash partitioned and sorted. In our jobs we join 2 tables. We use Sort stage for each input link, to sort data and to Hash partition, before the Join stage.
By using an explicit Sort stage is there any advantage over the in-stage sorting ? By in-stage sorting i mean the Sort option inside the Join stage.
Is explicit sort performance wise better compared to in-stage sort ?
Can anyone please clarify and provide more details on this?
Thanks in Advance !
What is difference between explicit Sort stage and sort ....
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 147
- Joined: Sat Apr 30, 2005 1:23 am
- Location: Bangalore,India
When u need a sort functionality along with another stage like a join in this case it is always better to do in-stage/implicit sorting. This has advantages over an additional stage being put.
eg. In ur case u might have put 2 sort stages on each link before the join. This will increase the number of process in the job. Even imagine running on 'n' nodes will create many sort process that require additional resource. In an implict or in-stage sorting the DS engine sorts the data directly in memory and treats the join and sort as a single process.
eg. In ur case u might have put 2 sort stages on each link before the join. This will increase the number of process in the job. Even imagine running on 'n' nodes will create many sort process that require additional resource. In an implict or in-stage sorting the DS engine sorts the data directly in memory and treats the join and sort as a single process.
Rajeev
Nobody knows Everything,
But U should not be the One who knows Nothing.
Nobody knows Everything,
But U should not be the One who knows Nothing.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Explicit Sort stages allow you to more easily control the memory allocated to sorting, and to perform sub-sorts without re-sorting the already-sorted sort key columns.
All forms of sort will create extra processes. Look at the score to verify that this is the case.
All forms of sort will create extra processes. Look at the score to verify that this is the case.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.