Could anyone tell the difference between Sort stage and inline Sort ? Have performance issue ?
Thanks a lot !
Difference between Sort stage and inline Sort
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
They both sort data if configured correctly. "Performance issue" is, more than anything, a matter of expectations. To determine which completes more quickly you can create jobs that use both methods, but remember to be fair - allow for cache effects (ideally re-boot server between tests), and sort many different sets of data with different characteristics. Post your results here, if you would be so kind.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi Ray,
Its known fact that explicit sort is more efficient than an implicit sort. But i really dont understant how can datastage utility sort is more efficient than Unix one. In case of dataset, ok, datastage utility is the only way. But in case of sequential file, i was expecting unix to be more faster. Is it something related to hashing the key and processing....
And also for Counting number of records, when i tested with some small files (few GBs) with unix wc command is far better than aggregator to count the number of records....
regards
kumar
Its known fact that explicit sort is more efficient than an implicit sort. But i really dont understant how can datastage utility sort is more efficient than Unix one. In case of dataset, ok, datastage utility is the only way. But in case of sequential file, i was expecting unix to be more faster. Is it something related to hashing the key and processing....
And also for Counting number of records, when i tested with some small files (few GBs) with unix wc command is far better than aggregator to count the number of records....
regards
kumar
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Neither the Sort stage nor the implicit sort (by which I assume you to mean sorting specified on the input link of a stage) goes out to UNIX sort command - at least by default. If you want to bring UNIX sort into the mix why not bring third-part sort utilities such as SyncSort or CoSort in as well? These survive solely by being faster than anything else.
DataStage has overheads that the UNIX sort command does not have, even if processing a single sequential file. Not least of these is the process overhead - conductor, section leader(s), players. On the other hand, the data stream is within DataStage, so you can go on and do other things with it.
DataStage has overheads that the UNIX sort command does not have, even if processing a single sequential file. Not least of these is the process overhead - conductor, section leader(s), players. On the other hand, the data stream is within DataStage, so you can go on and do other things with it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 60
- Joined: Sat Jan 24, 2004 12:52 pm
- Location: Mount Carmel, IL
Boys and girls, the inline sort and sort stage are the same thing. Check the code....tsort=tsort. You have a few extra options in the sort stage that you don't get in the inline sort, but other than that, it's the same.
As for why tsort is faster than a plain UNIX sort, it sorts each partition (assuming you have a config file with multiple nodes defined). And unless you write an elaborate shell script, sort won't do that on its own. Back in PX 6.x, you could call SyncSort directly from the sort stage, but most shops couldn't afford licenses for both SyncSort and PX, so they took that functionality out at 7.0.
As for why tsort is faster than a plain UNIX sort, it sorts each partition (assuming you have a config file with multiple nodes defined). And unless you write an elaborate shell script, sort won't do that on its own. Back in PX 6.x, you could call SyncSort directly from the sort stage, but most shops couldn't afford licenses for both SyncSort and PX, so they took that functionality out at 7.0.