Performance Issues

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Mayer
Participant
Posts: 10
Joined: Tue Aug 09, 2005 12:55 am

Performance Issues

Post by Mayer »

Hi All,
Iam using 5 oci stages with complex queries which are tuned at the maximum that can be done. The queries take less time when it fired on database(SQL prompt).
When the job is run it takes a 1 rows/sec or even 0 rows/sec.One OCI is the stream and rest all are look ups(please dont suggest on making it pass through hash).can anybody suggest on this of what else can be done and what can be the reason.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

If you just read the data from Oracle without lookups you are hopefully getting several thousand rows/second, right? The lookups are slowing the job down, as they are being done explicitly for each row.

You can speed these up by ensuring that you perform optimal queries, use only those columns needed, optimize connectivity, and perhaps even sort your data to try to make the lookups re-use buffers. These changes will most likely only slightly improve performance.

If you change your job to be a multi-instance one and modify your source data SELECT statement to use a data column to split the data. For instance, if you have a numeric column that is spread relatively evenly, you can modify the SELECT to include "MOD(<Column>,{number of parallel jobs})={instancenumber}". This will increase performance by running {number of parallel jobs} in parallel but will also increase your overall system load.

I would look very carefully at your reference tables and decided whether any of them can be integrated into the original SELECT statement. The most profitable excercise here will be to use hashed files for as many of the lookups as possible, especially where the volumes are low enough to allow the hashed files to be loaded into memory. I don't know why you don't want to do this.
rkdatastage
Participant
Posts: 107
Joined: Wed Sep 29, 2004 10:15 am

Post by rkdatastage »

Hope u are fine with the design
in my view try to concentrate on performance area like the transaction size , array size.
check whether u are able to connect to database for all the oci stages.

why u are not using hash file stage , what is the problem .In general performance wise its better to use hashfile stage rather than oci stage
Hope u know very well about it.

RK
Mayer
Participant
Posts: 10
Joined: Tue Aug 09, 2005 12:55 am

Post by Mayer »

Hi All ,

Thanks alot for the suggestion.

The reason iam not using hashfile is
1) the lookups are not taking time .
2)the source (stream link) is taking a long time.

Somehow need to improve the source OCI rather then for lookups.

Will make all lookups into hashfile and check again.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Mayer,

what is your pure read speed for the main data? What is your pure read speed for one of your lookup tables into a hashed file? Surely more than 1 row per second?
srinagesh
Participant
Posts: 125
Joined: Mon Jul 25, 2005 7:03 am

Post by srinagesh »

Try to use InterProcess stage.

Code: Select all

                           lookups
                                        |
                                        V
source -> Interprocess -> Transformation -> Interprocess -> Target
Last edited by srinagesh on Wed Nov 23, 2005 6:15 am, edited 1 time in total.
srinagesh
Participant
Posts: 125
Joined: Mon Jul 25, 2005 7:03 am

Post by srinagesh »

oops.. the link for lookup should have been on the top of the transformation.

sorry for that


also ensure that ur array size is optimal
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Edit your post and put code tags around it.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Create THIS job using the same SQL and report the flow rate:

Code: Select all

OCI  ----->  SeqFile
This will serve as a start point for determining where the bottleneck - if any - exists.

Is Oracle server on the same machine as DataStage server, or are you performing lookups over a network connection?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Mayer
Participant
Posts: 10
Joined: Tue Aug 09, 2005 12:55 am

Major Issues (Using Selfjoin)

Post by Mayer »

Hi,
Please have the details to analyze further ,

I cant use a Hashfile bcoz of following reason ,
1. I have a source selecting a column A
2. I have to do a lookup which is having source query like this ,
" select T1.A1 from table T T1 ,T T2" where <join conditions>
--- note this is a self join.
where T2.A2 = :1 (:1 refering to source column A).
here at the o/p T1.A1 will be a key which is passed to the transformer.
3. Both the source and lookups are having complex queries which are optimized at the maximum.

Hope this is clear. Optimization /suggestions are needed at step2.Iam selecting some different column which will act as a key in a Transformer stage and the lookup is on other column.(It is a self join )

ex . suppose iam doing a self join on emp ,

select M.mgrname from emp A , emp M
where A.mgrname = :1 (source mgrname)
and a.empid = m.empid
and ......
Note : M.mgrname <> A.mgrname and M.mgrname is a key in transformer stage.

Can anyone suggest better alternative for this in the same job.
Neil C
Participant
Posts: 46
Joined: Sun Apr 06, 2003 8:59 pm
Location: Auckland, New Zealand

Post by Neil C »

select T1.A1 from table T T1 ,T T2" where <join conditions>
--- note this is a self join.
Why can this query not be used to create a hash file? Just select T2.A2, as well as T1.A1, and make this the key of the hash file. How many rows would this query generate?

You say that the lookup has been 'optimized at the maximum'. Could you please post a little more detail of the SQL used, the specification of the fields in the join condition, and the indexe(s) available on the emp table.

What does the access plan report look like? Is it using an index, or doing a table space scan?
Neil Courtney

"MG - Life's too short not to"
Post Reply