slow execution of simple job using Oracle connector

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

slow execution of simple job using Oracle connector

Post by iq_etl »

I have a very simple job which reads a dataset of changes with eight columns plus a change code, and uses a PL_SQL CASE statement to apply the changes using an Oracle connector. To process UPDATEs to about 230,000 records, it's taking about 3 1/2 hours. By contrast, a very similar job that actually has more columns is processing about 50,000 records in 16 seconds.

I asked our DBA to take a look at what it was doing, and she said that for most of the time the database is just waiting. The job finishes successfully with no warnings - I just can't figure out why it's so slow!

I've tried changing it to run in sequential mode, but that had no noticeable impact. (By default it is running in parallel mode on two nodes.)

Any suggestions or debugging techniques would be much appreciated. I'd also be happy to provide any other information that might be useful.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Can we start with an explanation of what "uses a PL_SQL CASE statement to apply the changes" means? Thanks. That and what settings you are using in the Connector.
-craig

"You can never have too many knives" -- Logan Nine Fingers
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

Absolutely. In the 'Properties' tab of the Oracle Connector stage, the write mode is set to PL/SQL and the following PL/SQL block is executed:

Code: Select all

declare 

change_code nchar(1);
ccyys int(5);
snapshot nvarchar2(3);
sri int(9);
class_unique nvarchar2(5);
pe_occurrence nchar(3);
dept_abbr nvarchar2(3);
course_number nvarchar2(6);

begin

CASE :change_code
  when '1' then
    INSERT into UTDBA.NR_STG_STUDENT_CLASSES 
      (CCYYS, SNAPSHOT, SRI, CLASS_UNIQUE, PE_OCCURRENCE, DEPT_ABBR, COURSE_NUMBER, UIN)
    VALUES
      (:CCYYS, :SNAPSHOT, :SRI, :CLASS_UNIQUE, :PE_OCCURRENCE, :DEPT_ABBR, :COURSE_NUMBER, :UIN);
  when '2' then
    DELETE from UTDBA.NR_STG_STUDENT_CLASSES
      where CCYYS = :CCYYS
          and SNAPSHOT = :SNAPSHOT
          and SRI = :SRI
          and PE_OCCURRENCE = :PE_OCCURRENCE;
  when '3' then
    UPDATE UTDBA.NR_STG_STUDENT_CLASSES
       SET CLASS_UNIQUE = :CLASS_UNIQUE, 
               DEPT_ABBR = :DEPT_ABBR,
               COURSE_NUMBER = :COURSE_NUMBER,
               UIN = :UIN
       WHERE CCYYS = :CCYYS
             AND SNAPSHOT = :SNAPSHOT
             AND SRI = :SRI
             AND PE_OCCURRENCE = :PE_OCCURRENCE;
end case;

end;
For the run times I specified, all rows in the file have a change code of 3 (so it is UPDATE-ing).

The table action is set to Append, and pretty much all of the other settings are set to the default.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Gotcha... an anonymous block. Let's start with the obvious first question, are the fields in your 'where' clause indexed in such a way that supports the update (and delete)?
-craig

"You can never have too many knives" -- Logan Nine Fingers
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

I think so. The unique key is defined as the combination of CCYYS, SNAPSHOT, SRI, CLASS_UNIQUE, and PE_OCCURRENCE. As it happens, CLASS_UNIQUE is not really needed - the combination of the other four columns is still unique. I've tried it with and without 'AND CLASS_UNIQUE = :CLASS_UNIQUE' included in the query, and it does not appear to make a difference in performance either way.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Have your DBA trace the session and see if the index is being used for the updates. If you don't use one of the fields in the composite index (unless it is on the 'trailing' edge) that would preclude index use but using all five of of them should fare much better.

I'm assuming your issue is that the index isn't being used and you are doing a full table scan for each update. Your DBA should be able to confirm/deny.
-craig

"You can never have too many knives" -- Logan Nine Fingers
iq_etl
Premium Member
Premium Member
Posts: 105
Joined: Tue Feb 08, 2011 9:26 am

Post by iq_etl »

Thanks. I've asked our DBA to look at this again, and will post the results.
koti9
Participant
Posts: 52
Joined: Wed Nov 09, 2005 10:51 am

Post by koti9 »

I know its an old post, but just thought of providing solution to the issue. I was facing the same issue with slow running job with inserts/updates/deletes, obvious choice was plsql to do all 3 in one shot, but unlike case statement mentioned in previous solution,i tried merge update/delete/insert, but merge too was way too slow. Then i split Update/Inserts and Deletes separately, job finished way too fast,Just process update/insert with Update then insert, deletes with the deletes..
We just need to make sure we aren't touching the same rows inserts/update/deletes though.

Thanks & Regards
Koti
Post Reply