How to replace sort stage for huge volume data?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
swapna07
Participant
Posts: 22
Joined: Fri Mar 08, 2013 11:29 am
Contact:

How to replace sort stage for huge volume data?

Post by swapna07 »

Hi All,

I have a job which is processing around 89 million records. The job desgin is like this :

Seq--->Transf--->funnel--->sortstage--->Transf--- has 2 output---(seq,xml)

until sortstage it takes 30mins. But for sorting this huge volume data it takes 2 hrs 15 mins. :cry:
Can anyone help to reduce execution time to almost half?

Thanks in advance.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Pretty arbitrary requirement - cut the time in half. Seems to me you'd have to have access to a faster sort, something third party like SyncSort perhaps to accomplish that.

How many nodes does the job run on? You could try experimenting with that... if your source file allows parallel reads that may help.
-craig

"You can never have too many knives" -- Logan Nine Fingers
swapna07
Participant
Posts: 22
Joined: Fri Mar 08, 2013 11:29 am
Contact:

Post by swapna07 »

This job runs on 8 nodes. I tried to increase the buffer size in sort stage by setting auto-buffer , but it is not helping. Actually this is one job of the whole interface, whereas the whole interface is taking 3hr 45 mins to process 88 million records, this job itself is consuming most of it. Business has reverted asking to reduce job run time. I really don't know what to do!! :cry: :cry:

Let me know in case you can help. Thanks in advance.
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

Can you tell what you want to acheive by sorting data? Total number of keys in sort stage? What is the partition type used?
Thanks,
Prasanna
eph
Premium Member
Premium Member
Posts: 110
Joined: Mon Oct 18, 2010 10:25 am

Post by eph »

Hi,

I don't know if you tried to use the "Restrict memory usage" option in the sort stage? You could give a higher value in order to reduce the amount of data landing on discs.

Did you check the job score to verify if no inline sort is inserted automatically at runtime?

Also verify the i/o of the sorting volume configured in the apt_config_file.

Eric
Post Reply