Regarding sorting data before joining

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Regarding sorting data before joining

Post by ThilSe »

Hi,

I have a doubt.

Is there any difference between using a SORT Stage and PERFORM SORT in partitioning tab while joining the data other than :
->the ability to set Already sorted option and
->use of unix sort
->Visual appearance

Or there are any other benefits?

Thanks/Regards
Senthil
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

2 more:
1 For large data set sizes (>20 MB) the Sort Stage is better.

2 <:?: >I dont believe the On-Link Sort can do a sequential sort.. </:?: >
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What do you mean by "sequential sort"?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

For better performance, dedicated sort sate can always be chosed. Which has its own strach disc space.
Unix sort make use of the unix level sort option. It may be more effecient for data with less number of records.
If the incoming data is previously sorted, you can enable Already sorted option to get bette performance by avoiding resorting.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

ray.wurlod wrote:What do you mean by "sequential sort"?
What i meant was that the Sort stage can work in both sequential mode and parallel Mode. while the on-link sort, as it has to work on partitioned data, can't.

I dont recollect if on-link sorting is available on the Input of stages running in Sequential Mode.
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
Post Reply