Friends,
can any one suggest me?
i want to merge two files
whether i can use sort stage externally or leave the merge stage to take care of sorting before merging...
which will be better in performance ...?can u please explain the reson also?
Thank you
implicit Vs explicit sort
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
A reson (literally "thingy") is one of the fundamental units of magic - it's a subdivision of the thaum. "U" is one of the participants on this forum: the second person personal pronoun is "you".
You provide no definition of performance, but ask us to comment on how something undefined can be changed. That is unfair.
Sort stage gives more options, particularly memory allocation, though that can be accomplished using the APT_TSORT_STRESS_BLOCKSIZE environment variable. All else being equal, the inserted tsort operator, the sort specified on the input link and the explicit Sort stage on that input link will all perform identical sort operation. Presumably, then, under those assumptions (all else being equal or, as the economists are fond of saying, ceteris paribus), your choice is irrelevant to "performance" (whatever that means).
Once you start tweaking properties, of course, all bets are off.
You provide no definition of performance, but ask us to comment on how something undefined can be changed. That is unfair.
Sort stage gives more options, particularly memory allocation, though that can be accomplished using the APT_TSORT_STRESS_BLOCKSIZE environment variable. All else being equal, the inserted tsort operator, the sort specified on the input link and the explicit Sort stage on that input link will all perform identical sort operation. Presumably, then, under those assumptions (all else being equal or, as the economists are fond of saying, ceteris paribus), your choice is irrelevant to "performance" (whatever that means).
Once you start tweaking properties, of course, all bets are off.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 210
- Joined: Wed Feb 16, 2005 7:17 am
The decision to use external / implicit sort should be driven by many other factors apart from performance (the factors might end up driving performance). Some of them being
1. An external sort would definitely use the sort disk defined in your config file, but the implicit would not (this is what I have heard).
2. Would you need more flexibility in your sort, an explicit stage would provide more options than the implicit one.
3. The data you have.
Once you have the details, then only you can come to a conclusion as to what would you want and what would give you a better performance.
Cheers
Aakash
1. An external sort would definitely use the sort disk defined in your config file, but the implicit would not (this is what I have heard).
2. Would you need more flexibility in your sort, an explicit stage would provide more options than the implicit one.
3. The data you have.
Once you have the details, then only you can come to a conclusion as to what would you want and what would give you a better performance.
Cheers
Aakash
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
The maintainable approach is to let the job do its own sorting and you just configure the merge. Parallel sorts are quite fast. You only need to make your design more complicated if you need more speed - so do a test run of the simple approach first to see if it meets your needs.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
I like that. It satisfies the objective of keeping the design simple and it avoids playing around with the environment variables that try to prevent implicit sorts from being added to the job. Does it add any overhead to the job or does it get removed from the executing job? (Like a straight through copy stage).
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn