Datastage Performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
venkatrranga
Participant
Posts: 15
Joined: Thu Sep 23, 2004 2:18 pm

Datastage Performance

Post by venkatrranga »

Hi,

The out of the box Ascential installation is only able to perform 100 rows per second for most my jobs (simple loads too). I was hoping to findout if I need to setup any additional parameters to improve the job Performance.

Regards
Venkat Ranga
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The out of the box Ascential installation is only able to perform 100 rows per second for most my jobs (simple loads too).

Wrong. Create the following job.

Code: Select all

SeqFile  ----->  Transformer  ----->  SeqFile
On the output link of the Transformer stage add @FALSE (system variable) as a constraint expression.
Run the job using a reasonably large sequential file that is on the same machine as the DataStage server. Let us know whether you got more than 100 rows per second.

This job shows you the upper limit of performance on one CPU on your machine.

I suspect that your slowness results from an inefficient load strategy, but let's get the benchmark done first. Maybe, if you're running Linux on a 233MHx I386 CPU, you might see performance this bad, but I suspect you're going to report a figure rather higher than 100 rows per second.

Oh, one other question. Your rows don't happen to be 3MB wide, do they?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
venkatrranga
Participant
Posts: 15
Joined: Thu Sep 23, 2004 2:18 pm

Flat File load

Post by venkatrranga »

Ray,

This is what I found...

When I try create a flat file from Mainframe DB2, system was able to create a file at the rate of 600 rows per second...

The same file I used and outputted to other flatfile on the sever, it was processed at the rate of 3900 rows per second.

It seems like ODBC/network is taking long time to pull data out of Mainframe. Please let me know if you have any suggestions for improving ODBC performance.

Regards
Venkat Ranga
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Only the obvious ones. Get a faster ODBC driver. Make your SELECT statement as efficient as possible (for example index any constraining columns).

The answer to your next question is no, I don't have any comparative statistics on ODBC drivers out there. Perhaps you could ask IBM about the fastest method of getting DB2 data out of a mainframe. For example it's possible to expose the tables on a UNIX server, and access these directly. This is now using internal DB2 connectivity rather than trying to get an ODBC driver - a clunky beast at the best of times - to interact with a mainframe.

Another possibility is to have a mainframe process prepare your data and FTP it as a text file to your DataStage server machine. If you have DataStage 390, you can even get DataStage to write the COBOL and the JCL for compiling and running it!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
peternolan9
Participant
Posts: 214
Joined: Mon Feb 23, 2004 2:10 am
Location: Dublin, Ireland
Contact:

Re: Flat File load

Post by peternolan9 »

venkatrranga wrote:Ray,

This is what I found...

When I try create a flat file from Mainframe DB2, system was able to create a file at the rate of 600 rows per second...

The same file I used and outputted to other flatfile on the sever, it was processed at the rate of 3900 rows per second.

It seems like ODBC/network is taking long time to pull data out of Mainframe. Please let me know if you have any suggestions for improving ODBC performance.

Regards
Venkat Ranga
Venkat,
I suggest you test just the ODBC performance so that you can separate out the DS processing time vs the DS time. Also, make sure you use arrays if you are doing ODBC calls to a remote host. Are you aware that an ODBC Call requires a round trip to the source server for each row?

All,
just for your own education, try turning on the ODBC trace facility and doing a little bit of ODBC work and you will see just how much traffic to/from the source ODBC generates. When it is on a remote machine arrays help a lot. In Informatica a job slows down by a factor of 5-10 when the lookup is to a remote machine and Informatica 5.1 only had lookups via ODBC so it forced you to put the data in memory for all dimension tables...

If your data is remote and large and you want to get it to the DW it is almost always better to unload it at the source and ftp it depending on how fast you want the connection to go and how much time you have in your batch window....;-)

You must trade performance against ease of getting at data from one server which has it's ODBC DSNs set up....
Best Regards
Peter Nolan
www.peternolan.com
venkatrranga
Participant
Posts: 15
Joined: Thu Sep 23, 2004 2:18 pm

Hash File usage

Post by venkatrranga »

Hi,

I was able to improve the performance by replacing all Lookup as has lookups. The performance went up by 3000 rows per seconds.

Regards

Venkat Ranga
Post Reply