Using sorter in parallel

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
lakshmipriya
Participant
Posts: 31
Joined: Tue Jul 13, 2004 5:26 am
Location: chennai
Contact:

Using sorter in parallel

Post by lakshmipriya »

Hi

I want to know the better option of using the sorter.

1) To have all the rows flowing in a same sorter
2) To have several sorters in the same places by spliting the rows and sort the same.

Is there any impact of memory allocation while using 2 sorters in the place of 1.

Can any one suggest for the same.

Thanks
Lakshmi
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
1 sort is the way.
system sort or 3rd party sort utilities are faster then DS sort.
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
battaliou
Participant
Posts: 155
Joined: Mon Feb 24, 2003 7:28 am
Location: London
Contact:

EE sort

Post by battaliou »

I think your question regards sorting data in parallel streams? If this is the case, better to partition your data along your sort critria, and sort seperately. You'll find an exponential improvement in performance.

Of course there is a memory impact on using multiple sorting (especially if your partition is unbalanced), but in the EE sort remember you should set your minimum memory allocation to something reasonable (e.g. 20 meg) else it will swap out to disk. On a 3 node parallel, this implies 60 meg of sort memory required. Also, if you're able to to run time column propogation (RPC), I believe your sort memory requirements will be reduced.
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

roy wrote:system sort or 3rd party sort utilities are faster then DS sort.
You may be correct with Server's Sort, but not with EE's tsort. There are obviously some proprietary sorting technology out there that you pay hundred of thousand dollars on a license for, but among the available tools, tsort is very fast. VERY fast. It is also optimized for use within DataStage EE, making it as fast or faster than general purpose sorting tools you may find.

Do remember that the framework will automatically insert sorts in certain situations (read your documents for further details), so minimizing repeative actions will minimize the internal sorting done. Also, less is better -- fewer sorts = faster job, as sorting is a blocking technique.

Blocking technique = Where data can not go on by until all the data are gathered and handled.
Post Reply