Performance Issues

Mayer · Post by **Mayer** » Wed Nov 23, 2005 2:06 am

Hi All,
Iam using 5 oci stages with complex queries which are tuned at the maximum that can be done. The queries take less time when it fired on database(SQL prompt).
When the job is run it takes a 1 rows/sec or even 0 rows/sec.One OCI is the stream and rest all are look ups(please dont suggest on making it pass through hash).can anybody suggest on this of what else can be done and what can be the reason.

ArndW · Post by **ArndW** » Wed Nov 23, 2005 2:15 am

If you just read the data from Oracle without lookups you are hopefully getting several thousand rows/second, right? The lookups are slowing the job down, as they are being done explicitly for each row.

You can speed these up by ensuring that you perform optimal queries, use only those columns needed, optimize connectivity, and perhaps even sort your data to try to make the lookups re-use buffers. These changes will most likely only slightly improve performance.

If you change your job to be a multi-instance one and modify your source data SELECT statement to use a data column to split the data. For instance, if you have a numeric column that is spread relatively evenly, you can modify the SELECT to include "MOD(<Column>,{number of parallel jobs})={instancenumber}". This will increase performance by running {number of parallel jobs} in parallel but will also increase your overall system load.

I would look very carefully at your reference tables and decided whether any of them can be integrated into the original SELECT statement. The most profitable excercise here will be to use hashed files for as many of the lookups as possible, especially where the volumes are low enough to allow the hashed files to be loaded into memory. I don't know why you don't want to do this.

rkdatastage · Post by **rkdatastage** » Wed Nov 23, 2005 2:15 am

Hope u are fine with the design
in my view try to concentrate on performance area like the transaction size , array size.
check whether u are able to connect to database for all the oci stages.

why u are not using hash file stage , what is the problem .In general performance wise its better to use hashfile stage rather than oci stage
Hope u know very well about it.

RK

Mayer · Post by **Mayer** » Wed Nov 23, 2005 3:27 am

Hi All ,

Thanks alot for the suggestion.

The reason iam not using hashfile is
1) the lookups are not taking time .
2)the source (stream link) is taking a long time.

Somehow need to improve the source OCI rather then for lookups.

Will make all lookups into hashfile and check again.

ArndW · Post by **ArndW** » Wed Nov 23, 2005 4:15 am

Mayer,

what is your pure read speed for the main data? What is your pure read speed for one of your lookup tables into a hashed file? Surely more than 1 row per second?

srinagesh · Post by **srinagesh** » Wed Nov 23, 2005 4:59 am

Try to use InterProcess stage.

Code: Select all

                           lookups
                                        |
                                        V
source -> Interprocess -> Transformation -> Interprocess -> Target

srinagesh · Post by **srinagesh** » Wed Nov 23, 2005 5:00 am

oops.. the link for lookup should have been on the top of the transformation.

sorry for that

also ensure that ur array size is optimal

kduke · Post by **kduke** » Wed Nov 23, 2005 5:10 am

Edit your post and put code tags around it.

ray.wurlod · Post by **ray.wurlod** » Wed Nov 23, 2005 9:56 am

Create THIS job using the same SQL and report the flow rate:

Code: Select all

OCI  ----->  SeqFile

This will serve as a start point for determining where the bottleneck - if any - exists.

Is Oracle server on the same machine as DataStage server, or are you performing lookups over a network connection?

Mayer · Post by **Mayer** » Mon Nov 28, 2005 10:10 am

Hi,
Please have the details to analyze further ,

I cant use a Hashfile bcoz of following reason ,
1. I have a source selecting a column A
2. I have to do a lookup which is having source query like this ,
" select T1.A1 from table T T1 ,T T2" where <join conditions>
--- note this is a self join.
where T2.A2 = :1 (:1 refering to source column A).
here at the o/p T1.A1 will be a key which is passed to the transformer.
3. Both the source and lookups are having complex queries which are optimized at the maximum.

Hope this is clear. Optimization /suggestions are needed at step2.Iam selecting some different column which will act as a key in a Transformer stage and the lookup is on other column.(It is a self join )

ex . suppose iam doing a self join on emp ,

select M.mgrname from emp A , emp M
where A.mgrname = :1 (source mgrname)
and a.empid = m.empid
and ......
Note : M.mgrname <> A.mgrname and M.mgrname is a key in transformer stage.

Can anyone suggest better alternative for this in the same job.

Neil C · Post by **Neil C** » Mon Nov 28, 2005 7:48 pm

select T1.A1 from table T T1 ,T T2" where <join conditions>
--- note this is a self join.

Why can this query not be used to create a hash file? Just select T2.A2, as well as T1.A1, and make this the key of the hash file. How many rows would this query generate?

You say that the lookup has been 'optimized at the maximum'. Could you please post a little more detail of the SQL used, the specification of the fields in the join condition, and the indexe(s) available on the emp table.

What does the access plan report look like? Is it using an index, or doing a table space scan?

DSXchange

Performance Issues

Performance Issues

Major Issues (Using Selfjoin)