memory issues with Sort stage

pavan_test · Post by **pavan_test** » Wed Sep 05, 2007 10:25 am

Hi,

i have a sort stage in my DS job.my input file
has file size of 2TB. can anyone
please suggest me if i can use sort stage to sort
my input data.

What about the performance of the job and memory issues
if I use sort stage to sort such a huge file based on a
key column.

Any suggestions;

Regards
MArk

sud · Post by **sud** » Wed Sep 05, 2007 10:48 am

For a 2 tera file I would always go for unix sort. In case you use datastage use the Unix sort option.

shamshad · Post by **shamshad** » Wed Sep 05, 2007 11:50 am

Will it be possibe to run a before routine (shell script) that will simply sort your file and save the sorted inpit to another file. Then you can read the sorted file in Datastage as a source.

Unix is capable of handling sorting very efficiently. We saw considerable improvement when using sort in UNIX compared to DataStage. Actually we were first sorting a text file and then removing duplicates in DataStage job. The same thing we did in UNIX and it less time.

It's more of a design question how much logic one can keep within DataStage.

ray.wurlod · Post by **ray.wurlod** » Wed Sep 05, 2007 4:31 pm

In version 7.5 and later, DataStage Sort stage will outperform UNIX sort.

Make sure that you have PLENTY of scratch disk configured, to sort a file of this size. Use multiple file systems per partition for scratch disk, to improve disk I/O throughput when using scratch disk. More is better.

DSXchange

memory issues with Sort stage

memory issues with Sort stage

Re: memory issues with Sort stage