Page 1 of 1

Perfomance tuning in sort stage.

Posted: Sun Dec 21, 2008 11:49 am
by santhob
Hi,
Iam working in datastage 7.5, In one of the job we are processing more than 400 million records which fetches data from seq file and udb stage and feed to UDB stage. we are using copy, filter and sort stage as intermediate to do business validation. I need to reduce the run time of the, Job usually takes more than 5 to 6 hours.

Sort stage might be taking significant time so iam looking for change something in sort stage. Sort stage based on a key to remove duplicate and sort data in ascending order. It is possible to reduce time.

We have one option in properties tab of sort stage.
Restrict memory usage set to default
Whether increasing the Restrict memory usage will increase performance of the job?

Please advise me to do performance tuning. __.____._

Re: Perfomance tuning in sort stage.

Posted: Tue Dec 23, 2008 11:03 pm
by veera24
santhob wrote:Hi,
Iam working in datastage 7.5, In one of the job we are processing more than 400 million records which fetches data from seq file and udb stage and feed to UDB stage. we are using copy, filter and sort stage as intermediate to do business validation. I need to reduce the run time of the, Job usually takes more than 5 to 6 hours.

Sort stage might be taking significant time so iam looking for change something in sort stage. Sort stage based on a key to remove duplicate and sort data in ascending order. It is possible to reduce time.

We have one option in properties tab of sort stage.
Restrict memory usage set to default
Whether increasing the Restrict memory usage will increase performance of the job?

Please advise me to do performance tuning. __.____._
Hi,
You can try this command in transformer's stage properties.

sort -t"~" -k1,1 -k2,2 -k3,3 -k4,4 -k5,5 -k6,6 -k7,7 FILE1 > FILE2

Here,
~ ----> the delimiter (You can change as per your delimiter)
-k1,1,-k2,2 etc.... ----> the key columns based on that you want to to sort
File1: Input File name
File2:output file name

But i'm not sure that it will work in parallel. Beacuse i've used this command in Server edition. If it works in parallel too, then kindly let me know.

Thanks,
Veera

Posted: Wed Dec 24, 2008 3:35 am
by ray.wurlod
"might be"? Have you measured anything?

Increasing the amount of memory allocated to the Sort stage may help, provided that you have spare memory capacity.