DS Oracle optimization (parallel processing / query rewrite)

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
smleonard
Participant
Posts: 23
Joined: Tue Apr 27, 2004 11:48 am
Location: Westfield Center, OH

DS Oracle optimization (parallel processing / query rewrite)

Post by smleonard »

We're working with the Oracle plug-in stage and were running into some optimization issues. There have been at least two instances where a query is completely optimized in Oracle, yet the DataStage job runs extremely slow.

Example #1: Reading data from a text file, doing a lookup via Oracle plugin to a table, writing out to a text file. The reference table contains just 28 records. In the first run, the job takes about 5 minutes to get through 1000 rows of data. ( < 4 rows/sec). After that, I had our DBA set the reference table to non-parallel processing, and then ran the same job. In the second run, it ran at 298 rows/sec. When reviewing the Oracle trace, it said that most of the time wasted in the first job was spent in "PX Deq: Signal ACK". Is there any known issue in which the plugin stage does not work effectively with Parallel processing turned on with the tables? In this particular instance, we can use a hashed file to avoid the issue, but this option will not always be available. We need to understand why this is happening.

Example #2: In the second example, I was reviewing a job which has been out in production for some time versus the same job in development. I found that in development the job ran much slower so I ran explain plans on all the queries. There was one query in which the explain plans were different. I spoke with the DBA, and we found that in development, we have built a new materialized view. For optimizing other areas of our datawarehouse, Oracle has been set to allow "rewrite-queries". Because the MV existed, the Oracle optimizer was choosing to use it instead. In the first run (with rewriting turned on) the job took 25 minutes. In the second run (with rewriting turned off) the job took 2 minutes. Oddly, the Oracle trace files suggested that the second run spent MORE time in Oracle. That points to something being done behind the scenes in DataStage.

Again, I'm just looking to better understand what's going on in the background that I cannot see from the job design. Our DBA has said that these queries have been optimized, and I agree - there's no reason why a job should be running at < 4 rows/sec when it's doing a lookup to a table containing 28 rows. We seem to be finding more and more of these issues, so any information will help out.

Thanks,
-Sean
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Can you modify a copy of your job so that it doesn nothing apart from read your data and write it to /dev/null? That will give you the maximum speed and you can start narrowing down your possible sources of slowdowns.

Also, are you actually reading your 28-row reference table for each data row and not using a hashed file or lookup fileset (I can't figure out if this is a PX or a server problem)
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Example#1
How many CPUs do you have that your DBA has switched on the parrallel execution. This problem usually arises due to indexes. I have faced this problem in Datastage Enterprise Edition, where the PX Deq: Signal ACK( acknowledgment of insert, update, or delete) was time consuming. PX Deq: Signal ACK is an idle process created by oracle and it simple waits for acknowledgments at row level processing.
In EE there is an option to drop indexes and then insert records, and re-create indexes back again. This can be done in Server Edition also( haven't tried it lately though) by executing a before and after SQL to drop indexes and re-create them.
Any kind of updates,inserts to the database at the presence of indexes does give performance issues in Oracle. This is not the case with most of the other databases.
IHTH
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Keep in mind the fact that Example #1 is using an OCI stage to do a reference lookup... and that it will fire once for each row through the job. I would imagine the overhead of doing parallel queries on these singleton selects explains the increased execution times.
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

@Craig
Thats true, missed out on that piece, it definetly eats up time. :)
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply