Desperately seeking ... performance-bottlenecks

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Desperately seeking ... performance-bottlenecks

Post by admin »

Hi there

Ive got a DataStage batch consisting of a number of jobs each with a number of stages of various forms (nothing particularly fancy and Ive simplifed the description slightly for ease of discussion). I really have 2 related
questions:

* How do I readily identify the rate-limiting job(s) ... I know the
director-log has all the information but finding the info can be time-consuming/error-prone especially when some jobs are run in parallel.

* How do I identify the rate-limiting stage(s) ... I know
director-monitor shows rows-per-sec on links but that is clearly not sufficent in order to understand where best to spend limited time/effort to best value.

The interesting stuff will then come when I try and improve performance in identified areas but Im sure thats a different set of questions!!

Regards
James


This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended addressee, you must
not disclose, copy or take any action in reliance of this transmission.

Although this message and its contents have been scanned for viruses and no
viruses were detected, no responsibility whatsoever is accepted by the
Company, or any of its offices or companies for any loss or damage
arising in any way from receipt or use thereof.

If you have received this email in error please delete this message and notify the Polk System Administrator at postmaster@polkglobal.com.
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Dear all

Surely Im not the only person whos wanted/needed to tune a DataStage application for increased throughput ... any comments, experiences, suggestions would be very welcome even if they are of the "dont bother because ..." type!!

Thanks
James



-----Original Message-----
From: Ricketts, James [ mailto:JAMES.RICKETTS@POLK.CO.UK ]
Sent: 28 September 2001 08:59
To: datastage-users@oliver.com
Subject: Desperately seeking ... performance-bottlenecks


Hi there

Ive got a DataStage batch consisting of a number of jobs each with a number of stages of various forms (nothing particularly fancy and Ive simplifed the description slightly for ease of discussion). I really have 2 related
questions:

* How do I readily identify the rate-limiting job(s) ... I know the
director-log has all the information but finding the info can be time-consuming/error-prone especially when some jobs are run in parallel.

* How do I identify the rate-limiting stage(s) ... I know
director-monitor shows rows-per-sec on links but that is clearly not sufficent in order to understand where best to spend limited time/effort to best value.

The interesting stuff will then come when I try and improve performance in identified areas but Im sure thats a different set of questions!!

Regards
James


This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended addressee, you must not disclose, copy or take any action in reliance of this transmission.

Although this message and its contents have been scanned for viruses and no viruses were detected, no responsibility whatsoever is accepted by the Company, or any of its offices or companies for any loss or damage arising in any way from receipt or use thereof.

If you have received this email in error please delete this message and notify the Polk System Administrator at postmaster@polkglobal.com.
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

Hi James

My experience has been to split up complex jobs into simple Jobs. Not too many transforms in one job and not too many lookups in one job. Also, reading from an ODBC/OCI source and creating a sequential file, then reading the sequential file in another job seems to significantly increase performance. Initially I was reluctant to do this because I was reading the same source twice. However, the increase in speed was worth it. Reading from the ODBC/OCI the jobs used to run at around 2500 rows/sec. Reading from the sequential file went to 20000 rows/sec.

Something else I do when dealing with updates and inserts is to split the stream into an update stream and an insert stream. Both these streams do inserts into temporary tables in an Oracle database. The tables have no indexes when the insert is done. After the job has completed I call SQLPLUS (an Oracle utility), passing it sql statements that will do the insert/update, build indexes, analyse tables etc. This way the Oracle database does the work and DataStage is not tying up the network. (We need a much faster network, only got a 100MB network card in the NT server, talking to a Unix box where Oracle lives).

Im not sure about other relational databases, but Im sure they must have a similar utility you can call and pass parameters to.

Hope this helps.

Regards

Mark.


*********************************************************************
This footnote confirms that this e-mail message has been scanned for the presence of known computer viruses by the MessageLabs Virus
Control Centre. However, it is still recommended that you use local virus scanning software to monitor for the presence of viruses.
*********************************************************************
Locked