Page 1 of 1

Tunning UPSERT using Oracle Connector Stage

Posted: Mon Dec 06, 2010 10:49 am
by SwathiCh
Hi All,

I am trying to upsert 70 millions of data into a table in Oracle Database. Out of 70 millions, 15 million are updates and rest all of them are straight inserts.

Table is indexed on the columns which are used in WHERE CLAUSE in UPDATE QUERY.

I am using ORALCE ENTERPRISE STAGE with WRITE-MODE as UPSERT, UPSERT-ORDER as Update then insert.

I am already using the Environmental variables like
$APT_ORAUPSERT_COMMIT_ROW_INTERVAL to 100000
$APT_ORAUPSERT_COMMIT_TIME_INTERVAL to 60
to improve the job performance.

Job ran for 3 hrs in QA and completed successfully but in PROD the upsert is running @300/Sec.

For tunning this job,
I am going to split the INSERTs and UPDATEs into separate jobs and using the ORACLE CONNECTOR STAGE with
ARRAY_SIZE=2000 and TRANSACTION RECORD_COUNT=50000.

The question here is
Is there any other parameters can I add/ change the settings in Connector stage to improve the job performance?

I can't use BULK load for INSERTs since the table is already indexed on many columns and many other processes are using the table on same time.

Re: Tunning UPSERT using Oracle Connector Stage

Posted: Mon Dec 06, 2010 11:13 am
by kwwilliams
Change your order of operations, perform an insert then update. This requires a unique index on the key of the table. It will try to insert all 70 million rows and will then perform an update on records where there was a unique constraint violation. With this method there will be 15 million insert failures that then perform an update.

Whether to use an update then insert or insert then update should be based upon which one will be succesful most often. The way your jobs is set up the update will be attempted 70 million times and will only be succesful 15 million times.

Re: Tunning UPSERT using Oracle Connector Stage

Posted: Mon Dec 06, 2010 11:22 am
by kwwilliams
FYI, your posting states that you use the Oracle Connector in the subject line, and in the posting says you are using the enterprise stage. My initial advice wouldn't change on using insert and updates. But if that doesn't work for you please clarify which you are using.

Posted: Mon Dec 06, 2010 11:30 am
by SwathiCh
Thanks Williams for your quick response.

I used Oracle enterprise stage in the current job, during the tunning Admins suggesting to use Oracle connector stage instead of Enterprise stage. So in the new jobs I am using Connector stage.

I used INSERT then UPDATE initially but it taking longer time than Update then insert.

Posted: Mon Dec 06, 2010 5:09 pm
by kwwilliams
Post your job design and your insert and update statements.

Is your Oracle table partitioned and are you performed partitioned updates/inserts?

If you are daily inserting 55 million rows in the table daily, I would assume that the table is quite large, how often are the table statistics getting updated.

Posted: Tue Dec 07, 2010 8:23 am
by SwathiCh
I am analyzing the table daily before UPSERT and after UPSERT so STATS will be updated daily.

And my job design is

OracleEnterpriseStage--------->Transformer----------->OracleEnterpriseStage

There are no transformations in INSERT and UPDATE, they are straight SQL queries.

My Initial load will be 70 millions and my delta would be 2 millions daily, in delta most of them are updates probably.

So any better ideas to tune this job to bring down the run time?

Posted: Tue Dec 07, 2010 12:13 pm
by kwwilliams
Are you sure that your bottleneck is on the upsert? If you run the extract oracle enterprise stage into a peak, how many rows per second are you getting?

Analyzing stats doesn't mean that you have performant update/insert statements.

Is your table partitioned and are you performing partitioned reads/writes?

Posted: Tue Dec 07, 2010 3:04 pm
by tsanan
SwathiCh wrote:I am analyzing the table daily before UPSERT and after UPSERT so STATS will be updated daily.

And my job design is

OracleEnterpriseStage--------->Transformer----------->OracleEnterpriseStage

There are no transformations in INSERT and UPDATE, they are straight SQL queries.

My Initial load will be 70 millions and my delta would be 2 millions daily, in delta most of them are updates probably.

So any better ideas to tune this job to bring down the run time?



From your initial post it looks like the run time from QA is acceptable. If that is the case and you are comparing performance of the Job across different environments, you should consider capacities/performance and network of both QA and PROD environments - Oracle as well as Datastage.

Posted: Tue Dec 07, 2010 3:59 pm
by SwathiCh
Williams,

It is clearly problem with UPSERT only, If I run the job by removing target Oracle stage (by putting peek), it is processing around 12K/Sec.

Any way I am changing the job layout to handle Inserts and updates separately. So what are the necessary steps I need to take in my new job design while using the Oracle Connector stage apart from the settings which I already did (I already added $APT_ORAUPSERT_COMMIT_ROW_INTERVAL to 100000
$APT_ORAUPSERT_COMMIT_TIME_INTERVAL to 60 )

tsanan :-
As it is giving the problems in PROD, somehow I need to tune this job to maintain the functionality. We already did coparisions between QA and PROD those helped me for providing the justification for issues.

Posted: Wed Dec 08, 2010 2:25 pm
by kwwilliams
I wouldn't consider 12k/second highly performing, but if it works for you it is good enough.

You're not answering my question about the table structure. Tuning your array and commit points will buy you a little. Running in parallel on an Oracle partitioned table will increase poerformance dramatically.

In the connector, which you said you are using there are no parameters to tune just what you see for stage properties.

Posted: Thu Dec 09, 2010 8:32 am
by SwathiCh
Williams,

My table is not partitioned, even I can not do Truncate & load also since many other applications may access the same table for select at same time when I am doing UPSERT.

Table size

Posted: Wed Dec 15, 2010 11:33 am
by jseclen
Hi SwathiCh,

Did you consider the size of the table in both environments? maybe in the production environment is more larger than the environments where you do the tests.

The indexes also may be causing the load to be heavy.

The upsert process involves running insert-update or update-insert, which of these forms are applying?

Have you considered differentiating insert and update records to use two load oracle stages, one for inserts (using Load) and one for updates?