Page 1 of 1

Datastage Performance

Posted: Tue Jan 11, 2005 2:03 pm
by venkatrranga
Hi,

The out of the box Ascential installation is only able to perform 100 rows per second for most my jobs (simple loads too). I was hoping to findout if I need to setup any additional parameters to improve the job Performance.

Regards
Venkat Ranga

Posted: Tue Jan 11, 2005 3:51 pm
by ray.wurlod
The out of the box Ascential installation is only able to perform 100 rows per second for most my jobs (simple loads too).

Wrong. Create the following job.

Code: Select all

SeqFile  ----->  Transformer  ----->  SeqFile
On the output link of the Transformer stage add @FALSE (system variable) as a constraint expression.
Run the job using a reasonably large sequential file that is on the same machine as the DataStage server. Let us know whether you got more than 100 rows per second.

This job shows you the upper limit of performance on one CPU on your machine.

I suspect that your slowness results from an inefficient load strategy, but let's get the benchmark done first. Maybe, if you're running Linux on a 233MHx I386 CPU, you might see performance this bad, but I suspect you're going to report a figure rather higher than 100 rows per second.

Oh, one other question. Your rows don't happen to be 3MB wide, do they?

Flat File load

Posted: Tue Jan 11, 2005 4:49 pm
by venkatrranga
Ray,

This is what I found...

When I try create a flat file from Mainframe DB2, system was able to create a file at the rate of 600 rows per second...

The same file I used and outputted to other flatfile on the sever, it was processed at the rate of 3900 rows per second.

It seems like ODBC/network is taking long time to pull data out of Mainframe. Please let me know if you have any suggestions for improving ODBC performance.

Regards
Venkat Ranga

Posted: Tue Jan 11, 2005 8:45 pm
by ray.wurlod
Only the obvious ones. Get a faster ODBC driver. Make your SELECT statement as efficient as possible (for example index any constraining columns).

The answer to your next question is no, I don't have any comparative statistics on ODBC drivers out there. Perhaps you could ask IBM about the fastest method of getting DB2 data out of a mainframe. For example it's possible to expose the tables on a UNIX server, and access these directly. This is now using internal DB2 connectivity rather than trying to get an ODBC driver - a clunky beast at the best of times - to interact with a mainframe.

Another possibility is to have a mainframe process prepare your data and FTP it as a text file to your DataStage server machine. If you have DataStage 390, you can even get DataStage to write the COBOL and the JCL for compiling and running it!

Re: Flat File load

Posted: Sat Jan 15, 2005 5:23 am
by peternolan9
venkatrranga wrote:Ray,

This is what I found...

When I try create a flat file from Mainframe DB2, system was able to create a file at the rate of 600 rows per second...

The same file I used and outputted to other flatfile on the sever, it was processed at the rate of 3900 rows per second.

It seems like ODBC/network is taking long time to pull data out of Mainframe. Please let me know if you have any suggestions for improving ODBC performance.

Regards
Venkat Ranga
Venkat,
I suggest you test just the ODBC performance so that you can separate out the DS processing time vs the DS time. Also, make sure you use arrays if you are doing ODBC calls to a remote host. Are you aware that an ODBC Call requires a round trip to the source server for each row?

All,
just for your own education, try turning on the ODBC trace facility and doing a little bit of ODBC work and you will see just how much traffic to/from the source ODBC generates. When it is on a remote machine arrays help a lot. In Informatica a job slows down by a factor of 5-10 when the lookup is to a remote machine and Informatica 5.1 only had lookups via ODBC so it forced you to put the data in memory for all dimension tables...

If your data is remote and large and you want to get it to the DW it is almost always better to unload it at the source and ftp it depending on how fast you want the connection to go and how much time you have in your batch window....;-)

You must trade performance against ease of getting at data from one server which has it's ODBC DSNs set up....

Hash File usage

Posted: Sat Jan 22, 2005 6:16 pm
by venkatrranga
Hi,

I was able to improve the performance by replacing all Lookup as has lookups. The performance went up by 3000 rows per seconds.

Regards

Venkat Ranga