Hi,
i have a sort stage in my DS job.my input file
has file size of 2TB. can anyone
please suggest me if i can use sort stage to sort
my input data.
What about the performance of the job and memory issues
if I use sort stage to sort such a huge file based on a
key column.
Any suggestions;
Regards
MArk
memory issues with Sort stage
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 263
- Joined: Fri Sep 23, 2005 6:49 am
Re: memory issues with Sort stage
For a 2 tera file I would always go for unix sort. In case you use datastage use the Unix sort option.
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
Will it be possibe to run a before routine (shell script) that will simply sort your file and save the sorted inpit to another file. Then you can read the sorted file in Datastage as a source.
Unix is capable of handling sorting very efficiently. We saw considerable improvement when using sort in UNIX compared to DataStage. Actually we were first sorting a text file and then removing duplicates in DataStage job. The same thing we did in UNIX and it less time.
It's more of a design question how much logic one can keep within DataStage.
Unix is capable of handling sorting very efficiently. We saw considerable improvement when using sort in UNIX compared to DataStage. Actually we were first sorting a text file and then removing duplicates in DataStage job. The same thing we did in UNIX and it less time.
It's more of a design question how much logic one can keep within DataStage.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
In version 7.5 and later, DataStage Sort stage will outperform UNIX sort.
Make sure that you have PLENTY of scratch disk configured, to sort a file of this size. Use multiple file systems per partition for scratch disk, to improve disk I/O throughput when using scratch disk. More is better.
Make sure that you have PLENTY of scratch disk configured, to sort a file of this size. Use multiple file systems per partition for scratch disk, to improve disk I/O throughput when using scratch disk. More is better.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.