implicit Vs explicit sort

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Edwink
Participant
Posts: 47
Joined: Sat Aug 19, 2006 4:57 am
Location: Chennai

implicit Vs explicit sort

Post by Edwink »

Friends,
can any one suggest me?
i want to merge two files
whether i can use sort stage externally or leave the merge stage to take care of sorting before merging...
which will be better in performance ...?can u please explain the reson also?

Thank you
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A reson (literally "thingy") is one of the fundamental units of magic - it's a subdivision of the thaum. "U" is one of the participants on this forum: the second person personal pronoun is "you".

You provide no definition of performance, but ask us to comment on how something undefined can be changed. That is unfair.

Sort stage gives more options, particularly memory allocation, though that can be accomplished using the APT_TSORT_STRESS_BLOCKSIZE environment variable. All else being equal, the inserted tsort operator, the sort specified on the input link and the explicit Sort stage on that input link will all perform identical sort operation. Presumably, then, under those assumptions (all else being equal or, as the economists are fond of saying, ceteris paribus), your choice is irrelevant to "performance" (whatever that means).

Once you start tweaking properties, of course, all bets are off.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Edwink
Participant
Posts: 47
Joined: Sat Aug 19, 2006 4:57 am
Location: Chennai

Post by Edwink »

sorry :( .
aakashahuja
Premium Member
Premium Member
Posts: 210
Joined: Wed Feb 16, 2005 7:17 am

Post by aakashahuja »

The decision to use external / implicit sort should be driven by many other factors apart from performance (the factors might end up driving performance). Some of them being

1. An external sort would definitely use the sort disk defined in your config file, but the implicit would not (this is what I have heard).
2. Would you need more flexibility in your sort, an explicit stage would provide more options than the implicit one.
3. The data you have.

Once you have the details, then only you can come to a conclusion as to what would you want and what would give you a better performance.

Cheers
Aakash
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

The maintainable approach is to let the job do its own sorting and you just configure the merge. Parallel sorts are quite fast. You only need to make your design more complicated if you need more speed - so do a test run of the simple approach first to see if it meets your needs.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Sort stage is very useful for preventing unnecessary sorting, using Sort Mode "Don't Sort (Previously Sorted)".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I like that. It satisfies the objective of keeping the design simple and it avoids playing around with the environment variables that try to prevent implicit sorts from being added to the job. Does it add any overhead to the job or does it get removed from the executing job? (Like a straight through copy stage).
Post Reply