Hi
I want to know the better option of using the sorter.
1) To have all the rows flowing in a same sorter
2) To have several sorters in the same places by spliting the rows and sort the same.
Is there any impact of memory allocation while using 2 sorters in the place of 1.
Can any one suggest for the same.
Thanks
Using sorter in parallel
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 31
- Joined: Tue Jul 13, 2004 5:26 am
- Location: chennai
- Contact:
Using sorter in parallel
Lakshmi
Hi,
1 sort is the way.
system sort or 3rd party sort utilities are faster then DS sort.
IHTH,
1 sort is the way.
system sort or 3rd party sort utilities are faster then DS sort.
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
EE sort
I think your question regards sorting data in parallel streams? If this is the case, better to partition your data along your sort critria, and sort seperately. You'll find an exponential improvement in performance.
Of course there is a memory impact on using multiple sorting (especially if your partition is unbalanced), but in the EE sort remember you should set your minimum memory allocation to something reasonable (e.g. 20 meg) else it will swap out to disk. On a 3 node parallel, this implies 60 meg of sort memory required. Also, if you're able to to run time column propogation (RPC), I believe your sort memory requirements will be reduced.
Of course there is a memory impact on using multiple sorting (especially if your partition is unbalanced), but in the EE sort remember you should set your minimum memory allocation to something reasonable (e.g. 20 meg) else it will swap out to disk. On a 3 node parallel, this implies 60 meg of sort memory required. Also, if you're able to to run time column propogation (RPC), I believe your sort memory requirements will be reduced.
You may be correct with Server's Sort, but not with EE's tsort. There are obviously some proprietary sorting technology out there that you pay hundred of thousand dollars on a license for, but among the available tools, tsort is very fast. VERY fast. It is also optimized for use within DataStage EE, making it as fast or faster than general purpose sorting tools you may find.roy wrote:system sort or 3rd party sort utilities are faster then DS sort.
Do remember that the framework will automatically insert sorts in certain situations (read your documents for further details), so minimizing repeative actions will minimize the internal sorting done. Also, less is better -- fewer sorts = faster job, as sorting is a blocking technique.
Blocking technique = Where data can not go on by until all the data are gathered and handled.