What is difference between explicit Sort stage and sort ....

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

What is difference between explicit Sort stage and sort ....

Post by mavrick21 »

Hi All,

To use a join stage the data should be Hash partitioned and sorted. In our jobs we join 2 tables. We use Sort stage for each input link, to sort data and to Hash partition, before the Join stage.

By using an explicit Sort stage is there any advantage over the in-stage sorting ? By in-stage sorting i mean the Sort option inside the Join stage.

Is explicit sort performance wise better compared to in-stage sort ?

Can anyone please clarify and provide more details on this?

Thanks in Advance !
Raghavendra
Participant
Posts: 147
Joined: Sat Apr 30, 2005 1:23 am
Location: Bangalore,India

Post by Raghavendra »

Explicit sort stage uses temporary disk space when performing a sort. I believe when you are handling huge volumes of data you will not get resource problems as you are using temporory disk space.

Lets see our DS experts comment on this query.
Raghavendra
Dare to dream and care to achieve ...
rajeevn80
Participant
Posts: 28
Joined: Mon Jan 31, 2005 10:58 pm

Post by rajeevn80 »

When u need a sort functionality along with another stage like a join in this case it is always better to do in-stage/implicit sorting. This has advantages over an additional stage being put.
eg. In ur case u might have put 2 sort stages on each link before the join. This will increase the number of process in the job. Even imagine running on 'n' nodes will create many sort process that require additional resource. In an implict or in-stage sorting the DS engine sorts the data directly in memory and treats the join and sort as a single process.
Rajeev
Nobody knows Everything,
But U should not be the One who knows Nothing.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Explicit Sort stages allow you to more easily control the memory allocated to sorting, and to perform sub-sorts without re-sorting the already-sorted sort key columns.

All forms of sort will create extra processes. Look at the score to verify that this is the case.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Thanks all !
Post Reply