Page 1 of 1

memory issues with Sort stage

Posted: Wed Sep 05, 2007 10:25 am
by pavan_test
Hi,

i have a sort stage in my DS job.my input file
has file size of 2TB. can anyone
please suggest me if i can use sort stage to sort
my input data.

What about the performance of the job and memory issues
if I use sort stage to sort such a huge file based on a
key column.

Any suggestions;

Regards
MArk

Re: memory issues with Sort stage

Posted: Wed Sep 05, 2007 10:48 am
by sud
For a 2 tera file I would always go for unix sort. In case you use datastage use the Unix sort option.

Posted: Wed Sep 05, 2007 11:50 am
by shamshad
Will it be possibe to run a before routine (shell script) that will simply sort your file and save the sorted inpit to another file. Then you can read the sorted file in Datastage as a source.

Unix is capable of handling sorting very efficiently. We saw considerable improvement when using sort in UNIX compared to DataStage. Actually we were first sorting a text file and then removing duplicates in DataStage job. The same thing we did in UNIX and it less time.

It's more of a design question how much logic one can keep within DataStage.

Posted: Wed Sep 05, 2007 4:31 pm
by ray.wurlod
In version 7.5 and later, DataStage Sort stage will outperform UNIX sort.

Make sure that you have PLENTY of scratch disk configured, to sort a file of this size. Use multiple file systems per partition for scratch disk, to improve disk I/O throughput when using scratch disk. More is better.