Improve Seq Stage Performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
rcil
Charter Member
Charter Member
Posts: 70
Joined: Sat Jun 05, 2004 1:37 am

Improve Seq Stage Performance

Post by rcil »

Hello All,

I have total of three dsjobs in which the first two are the extracts from the database joining 5 tables in each with total of 40 columns and in the third job I am sorting and concatinating those two tab delimited output files using ExecSH as before routine and in the dsjob I split into four different files based on the simple constraints and I with simple derivations in each. The concatinated files contain 24 million records.

The speed of the first two extracts is 3000 to 4000 records per sec and in the 3rd job the sort command takes around couple of minutes but the dsjob extracts 325 rows per second which takes hours to complete the process.

Is there a way to imporve the performance of the sequential file stage the way it pulls the records.

thanks
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Your problem is not the sequential stage. Your problem is that you have a single-threaded job design that will only use a single cpu to do its work. If you have 6 cpus, you could use 6 instances of your third job to each handle 1/6th of the source data. In theory, you would scale your throughput up to being done in 1/6th the time.

This method of partitioning source data and using multiple instances to divide and conquer has been discussed ad nauseum on this forum. I'll post a link to similar discussions.

viewtopic.php?t=86907
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
rcil
Charter Member
Charter Member
Posts: 70
Joined: Sat Jun 05, 2004 1:37 am

Re: Improve Seq Stage Performance

Post by rcil »

Thank you for the inputs. As the hash file size limit is 2GB and in the UAT environment I have 24 million records and it could be more in production. Will the hashfile handle this big?

thanks
Neoyip
Participant
Posts: 1
Joined: Thu Mar 31, 2005 10:45 am
Location: Toronto, Canada

Re: Improve Seq Stage Performance

Post by Neoyip »

If hash file will excess 2gb, create it manually with HFC.
rcil wrote:Thank you for the inputs. As the hash file size limit is 2GB and in the UAT environment I have 24 million records and it could be more in production. Will the hashfile handle this big?

thanks
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Re: Improve Seq Stage Performance

Post by kcbland »

rcil wrote:Thank you for the inputs. As the hash file size limit is 2GB and in the UAT environment I have 24 million records and it could be more in production. Will the hashfile handle this big?

thanks
Without a doubt, spooling data into a hash file has significant overhead as compared to a sequential file. You may consider reading this post viewtopic.php?t=85364 to learn more about hash files and when/how to use them.

Regarding your question about size, yes, a 32BIT default hash file will not contain 24 million rows of data, if every row averags 100 characters of data. You'll need to use 64BIT hash files. This suggestion should be avoided.

The method I described to you is the one that lets you use more cpus and balance your efforts across multiple cpus.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sequential File stage is the fastest of the passive stages. It has clever mechanisms, such as look-ahead and buffering, built-in.

On the down side, you can't begin reading from a single sequential file until you've finished writing to it. You can with a hashed file, but it may not be appropriate to do so; this would depend on your design requirements.

On most operating systems there is no effective limit to the size of a sequential file (you may have to enable large sizes, for example by increasing your ulimit and the operating system's maximum file size).

While HFC will not create 64-bit-enabled hashed files, it will generate the commands for doing so.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply