improve performance of OCI stage

sheema · Post by **sheema** » Wed Oct 08, 2008 11:47 am

Hi,

we have a job where we pull data from a oracle table and then we have one stage which inserts data without clearing and another OCI to update existing data.

But the data is very huge around 30 million records.the record length is around 40 characters,right now i have array size set to 5000 and transaction size also is set to 5000.But job is very slow at 1400 rows/sec.
Can some one through some light on how to improve the performance of this job.

Thanks
sheema

sheema · Post by **sheema** » Wed Oct 08, 2008 1:23 pm

Can some one throw some light on this.

Thanks

Ultramundane · Post by **Ultramundane** » Wed Oct 08, 2008 2:09 pm

sheema wrote:Hi,

we have a job where we pull data from a oracle table and then we have one stage which inserts data without clearing and another OCI to update existing data.

But the data is very huge around 30 million records.the record length is around 40 characters,right now i have array size set to 5000 and transaction size also is set to 5000.But job is very slow at 1400 rows/sec.
Can some one through some light on how to improve the performance of this job.

Thanks
sheema

So, you are using the OCI stage to run over 30 million dynamically created SQL statements that must all be parsed and executed on your Oracle database? Can you just produce a flat file and run sqlldr with replace option or bulk load your data to a dummy table that you can run the Merge command against?

sheema · Post by **sheema** » Wed Oct 08, 2008 2:36 pm

Ok,Thanks

Will creating a Parallel job through Oracle Enterprise stage help in any manner or is using sqlldr the best option.

ray.wurlod · Post by **ray.wurlod** » Wed Oct 08, 2008 2:55 pm

Oracle Enterprise in Write mode uses sqlldr.

sheema · Post by **sheema** » Thu Oct 09, 2008 11:44 am

I have developed a parallel job ,and i did not change the oracle Enterprise stage options .
I see that 800 rows per sec are being processed.The server was giving around 2000 rows per/sec.Is it a good option to run it as a parallel job or server.

Thanks

shamshad · Post by **shamshad** » Thu Oct 09, 2008 12:51 pm

Sheema,

Are you inserting and updating data in target in 1 job?

sheema · Post by **sheema** » Thu Oct 09, 2008 12:57 pm

I am doing both insert and update as well.
For insert i am using write method:Load and for update the Write Method is upsert.

Thanks

shamshad · Post by **shamshad** » Thu Oct 09, 2008 1:06 pm

What I have done in the past is seperate out inserts and updates. This is more of a performance tuning issue and how to handle large amount of data in already populated table which should handle insert and updates.

First I separate inserts and bulk load them to target and then via another job run the updates. If performance is still not acceptable, I update the table first and then if there are indexes/constraints that can be disabled/dropped, I disable/drop them. Then insert all rows via bulk load and then enable constraint with no validate option.

Unfortunately, in most of the case there is no standard way of speeding up the process because each table is different with different constraint and indexes etc.

ray.wurlod · Post by **ray.wurlod** » Thu Oct 09, 2008 2:16 pm

And stop thinking that row/sec is a meaningful metric of "performance" in ETL. It isn't.

sheema · Post by **sheema** » Thu Oct 09, 2008 2:58 pm

I have split up my inserts and updates into 2 jobs
My Job design is as below

OracleEnterprise1----->Trnsfr------->Trnsfr1------->Oracle Enterprise2

In the stage OracleEnterprise1 i extract data from a oracle table, In Trnsfr I do Trim 3 or 4 columns and add a couple of new columns then map them to Trnsfr1 and load into the Oracle table using Oracle Enterprise2 stage(in job1) and update into oracle(in job2).

The no of updates are more than the inserts into the table.

Is there any way i can speed up the process.

Thanks

Ultramundane · Post by **Ultramundane** » Thu Oct 09, 2008 6:35 pm

sheema wrote:I have split up my inserts and updates into 2 jobs
My Job design is as below

OracleEnterprise1----->Trnsfr------->Trnsfr1------->Oracle Enterprise2

In the stage OracleEnterprise1 i extract data from a oracle table, In Trnsfr I do Trim 3 or 4 columns and add a couple of new columns then map them to Trnsfr1 and load into the Oracle table using Oracle Enterprise2 stage(in job1) and update into oracle(in job2).

The no of updates are more than the inserts into the table.

Is there any way i can speed up the process.

Thanks

Truncate and Bulk load a table with similar schema and then use the Oracle MERGE command to populate the production table. When the rows are "WHEN MATCHED THEN" on the primary key, update the existing rows and "WHEN NOT MATCHED THEN" insert into the production table.

sheema · Post by **sheema** » Fri Oct 10, 2008 7:34 am

Thanks,
How is the Merge command supposed to run. Will it be run manually by DBA to load the table.

Thanks

Ultramundane · Post by **Ultramundane** » Fri Oct 10, 2008 7:40 am

sheema wrote:Thanks,
How is the Merge command supposed to run. Will it be run manually by DBA to load the table.

Thanks

Define manually? For instance, you could write your own shell script and use the external source stage to invoke the MERGE command and capture the return code and feed the return code back to datastage and any results output to standard output can be sent down a link and into a sequential file. You could use a single varcharacter column to capture the results and no delimeter. Then, you can still invoke this job through a datastage sequencer if that is a must.

DSXchange

improve performance of OCI stage

improve performance of OCI stage

Re: improve performance of OCI stage