Tunning UPSERT using Oracle Connector Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Tunning UPSERT using Oracle Connector Stage

Post by SwathiCh »

Hi All,

I am trying to upsert 70 millions of data into a table in Oracle Database. Out of 70 millions, 15 million are updates and rest all of them are straight inserts.

Table is indexed on the columns which are used in WHERE CLAUSE in UPDATE QUERY.

I am using ORALCE ENTERPRISE STAGE with WRITE-MODE as UPSERT, UPSERT-ORDER as Update then insert.

I am already using the Environmental variables like
$APT_ORAUPSERT_COMMIT_ROW_INTERVAL to 100000
$APT_ORAUPSERT_COMMIT_TIME_INTERVAL to 60
to improve the job performance.

Job ran for 3 hrs in QA and completed successfully but in PROD the upsert is running @300/Sec.

For tunning this job,
I am going to split the INSERTs and UPDATEs into separate jobs and using the ORACLE CONNECTOR STAGE with
ARRAY_SIZE=2000 and TRANSACTION RECORD_COUNT=50000.

The question here is
Is there any other parameters can I add/ change the settings in Connector stage to improve the job performance?

I can't use BULK load for INSERTs since the table is already indexed on many columns and many other processes are using the table on same time.
--
Swathi Ch
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Tunning UPSERT using Oracle Connector Stage

Post by kwwilliams »

Change your order of operations, perform an insert then update. This requires a unique index on the key of the table. It will try to insert all 70 million rows and will then perform an update on records where there was a unique constraint violation. With this method there will be 15 million insert failures that then perform an update.

Whether to use an update then insert or insert then update should be based upon which one will be succesful most often. The way your jobs is set up the update will be attempted 70 million times and will only be succesful 15 million times.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Tunning UPSERT using Oracle Connector Stage

Post by kwwilliams »

FYI, your posting states that you use the Oracle Connector in the subject line, and in the posting says you are using the enterprise stage. My initial advice wouldn't change on using insert and updates. But if that doesn't work for you please clarify which you are using.
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Post by SwathiCh »

Thanks Williams for your quick response.

I used Oracle enterprise stage in the current job, during the tunning Admins suggesting to use Oracle connector stage instead of Enterprise stage. So in the new jobs I am using Connector stage.

I used INSERT then UPDATE initially but it taking longer time than Update then insert.
--
Swathi Ch
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Post your job design and your insert and update statements.

Is your Oracle table partitioned and are you performed partitioned updates/inserts?

If you are daily inserting 55 million rows in the table daily, I would assume that the table is quite large, how often are the table statistics getting updated.
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Post by SwathiCh »

I am analyzing the table daily before UPSERT and after UPSERT so STATS will be updated daily.

And my job design is

OracleEnterpriseStage--------->Transformer----------->OracleEnterpriseStage

There are no transformations in INSERT and UPDATE, they are straight SQL queries.

My Initial load will be 70 millions and my delta would be 2 millions daily, in delta most of them are updates probably.

So any better ideas to tune this job to bring down the run time?
--
Swathi Ch
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Are you sure that your bottleneck is on the upsert? If you run the extract oracle enterprise stage into a peak, how many rows per second are you getting?

Analyzing stats doesn't mean that you have performant update/insert statements.

Is your table partitioned and are you performing partitioned reads/writes?
tsanan
Premium Member
Premium Member
Posts: 8
Joined: Thu Mar 10, 2005 8:45 am

Post by tsanan »

SwathiCh wrote:I am analyzing the table daily before UPSERT and after UPSERT so STATS will be updated daily.

And my job design is

OracleEnterpriseStage--------->Transformer----------->OracleEnterpriseStage

There are no transformations in INSERT and UPDATE, they are straight SQL queries.

My Initial load will be 70 millions and my delta would be 2 millions daily, in delta most of them are updates probably.

So any better ideas to tune this job to bring down the run time?



From your initial post it looks like the run time from QA is acceptable. If that is the case and you are comparing performance of the Job across different environments, you should consider capacities/performance and network of both QA and PROD environments - Oracle as well as Datastage.
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Post by SwathiCh »

Williams,

It is clearly problem with UPSERT only, If I run the job by removing target Oracle stage (by putting peek), it is processing around 12K/Sec.

Any way I am changing the job layout to handle Inserts and updates separately. So what are the necessary steps I need to take in my new job design while using the Oracle Connector stage apart from the settings which I already did (I already added $APT_ORAUPSERT_COMMIT_ROW_INTERVAL to 100000
$APT_ORAUPSERT_COMMIT_TIME_INTERVAL to 60 )

tsanan :-
As it is giving the problems in PROD, somehow I need to tune this job to maintain the functionality. We already did coparisions between QA and PROD those helped me for providing the justification for issues.
--
Swathi Ch
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

I wouldn't consider 12k/second highly performing, but if it works for you it is good enough.

You're not answering my question about the table structure. Tuning your array and commit points will buy you a little. Running in parallel on an Oracle partitioned table will increase poerformance dramatically.

In the connector, which you said you are using there are no parameters to tune just what you see for stage properties.
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Post by SwathiCh »

Williams,

My table is not partitioned, even I can not do Truncate & load also since many other applications may access the same table for select at same time when I am doing UPSERT.
--
Swathi Ch
jseclen
Participant
Posts: 133
Joined: Wed Mar 05, 2003 4:19 pm
Location: Lima - Peru. Sudamerica
Contact:

Table size

Post by jseclen »

Hi SwathiCh,

Did you consider the size of the table in both environments? maybe in the production environment is more larger than the environments where you do the tests.

The indexes also may be causing the load to be heavy.

The upsert process involves running insert-update or update-insert, which of these forms are applying?

Have you considered differentiating insert and update records to use two load oracle stages, one for inserts (using Load) and one for updates?
Saludos,

Miguel Seclén
Lima - Peru
Post Reply