Page 1 of 1
Need help with sorting the data
Posted: Wed Dec 23, 2009 3:54 am
by zulfi123786
I am joining two datasets using the join stage, to sort the incoming data which of the two is best:
1) In link sort
2) using 2 sort stages explicitly on both links
It would be of great help if the reason is specified.
Thanks..........
Posted: Wed Dec 23, 2009 4:17 am
by priyadarshikunal
Both are the same as far as only sorting (using datastage sort) is concerned as both will insert a tsort operator.
but an explicit sort stage give you more options than inlink sort like
getting key change column, liberty to select the utility used for sorting (Datastage/Unix), It can dump stats and also you can restrict the memory usage from the stage itself.
Posted: Wed Dec 23, 2009 5:54 am
by zulfi123786
My concern is only sorting the data, no other options required
Posted: Wed Dec 23, 2009 7:25 am
by priyadarshikunal
zulfi123786 wrote:My concern is only sorting the data, no other options required
Then i don't think it makes any difference during execution but I prefer explicit sort stage.
Posted: Wed Dec 23, 2009 7:50 am
by srinivas.g
Performance wise Inline sort is best compare to explicit sort stages.
Posted: Wed Dec 23, 2009 8:09 am
by chulett
srinivas.g wrote:Performance wise Inline sort is best compare to explicit sort stages.
Based on what? Under the covers they're both the same tsort operator.
Posted: Wed Dec 23, 2009 10:31 am
by zulfi123786
could you please mention which document of datastage discusses tsort.....
I didnot find anything mentioned in the DS parallel job developer guide saying the sort stage inserts a tsort operator.
Posted: Wed Dec 23, 2009 10:59 am
by chulett
You'd probably have to go back to an ORCHESTRATE manual for that. Check the Generated OSH tab in the job, you'll see them there.
Posted: Wed Dec 23, 2009 4:21 pm
by ray.wurlod
srinivas.g wrote:Performance wise Inline sort is best compare to explicit sort stages.
I disagree 100%. But I'd be interested to hear your reasons.
Two reasons an explicit Sort stage is better (and can be better for performance):
- you can control the amount of memory allocated
you can generate key change indicators