Hi All,
I have a job which is processing around 89 million records. The job desgin is like this :
Seq--->Transf--->funnel--->sortstage--->Transf--- has 2 output---(seq,xml)
until sortstage it takes 30mins. But for sorting this huge volume data it takes 2 hrs 15 mins.
Can anyone help to reduce execution time to almost half?
Thanks in advance.
How to replace sort stage for huge volume data?
Moderators: chulett, rschirm, roy
Pretty arbitrary requirement - cut the time in half. Seems to me you'd have to have access to a faster sort, something third party like SyncSort perhaps to accomplish that.
How many nodes does the job run on? You could try experimenting with that... if your source file allows parallel reads that may help.
How many nodes does the job run on? You could try experimenting with that... if your source file allows parallel reads that may help.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
This job runs on 8 nodes. I tried to increase the buffer size in sort stage by setting auto-buffer , but it is not helping. Actually this is one job of the whole interface, whereas the whole interface is taking 3hr 45 mins to process 88 million records, this job itself is consuming most of it. Business has reverted asking to reduce job run time. I really don't know what to do!!
Let me know in case you can help. Thanks in advance.
Let me know in case you can help. Thanks in advance.
-
- Participant
- Posts: 117
- Joined: Wed Feb 06, 2013 9:24 am
- Location: Chennai,TN, India
Hi,
I don't know if you tried to use the "Restrict memory usage" option in the sort stage? You could give a higher value in order to reduce the amount of data landing on discs.
Did you check the job score to verify if no inline sort is inserted automatically at runtime?
Also verify the i/o of the sorting volume configured in the apt_config_file.
Eric
I don't know if you tried to use the "Restrict memory usage" option in the sort stage? You could give a higher value in order to reduce the amount of data landing on discs.
Did you check the job score to verify if no inline sort is inserted automatically at runtime?
Also verify the i/o of the sorting volume configured in the apt_config_file.
Eric