Page 1 of 1

improve performance of OCI stage

Posted: Wed Oct 08, 2008 11:47 am
by sheema
Hi,

we have a job where we pull data from a oracle table and then we have one stage which inserts data without clearing and another OCI to update existing data.

But the data is very huge around 30 million records.the record length is around 40 characters,right now i have array size set to 5000 and transaction size also is set to 5000.But job is very slow at 1400 rows/sec.
Can some one through some light on how to improve the performance of this job.

Thanks
sheema

Posted: Wed Oct 08, 2008 1:23 pm
by sheema
Can some one throw some light on this.


Thanks

Re: improve performance of OCI stage

Posted: Wed Oct 08, 2008 2:09 pm
by Ultramundane
sheema wrote:Hi,

we have a job where we pull data from a oracle table and then we have one stage which inserts data without clearing and another OCI to update existing data.

But the data is very huge around 30 million records.the record length is around 40 characters,right now i have array size set to 5000 and transaction size also is set to 5000.But job is very slow at 1400 rows/sec.
Can some one through some light on how to improve the performance of this job.

Thanks
sheema
So, you are using the OCI stage to run over 30 million dynamically created SQL statements that must all be parsed and executed on your Oracle database? Can you just produce a flat file and run sqlldr with replace option or bulk load your data to a dummy table that you can run the Merge command against?

Posted: Wed Oct 08, 2008 2:36 pm
by sheema
Ok,Thanks

Will creating a Parallel job through Oracle Enterprise stage help in any manner or is using sqlldr the best option.

Posted: Wed Oct 08, 2008 2:55 pm
by ray.wurlod
Oracle Enterprise in Write mode uses sqlldr.

Posted: Thu Oct 09, 2008 11:44 am
by sheema
I have developed a parallel job ,and i did not change the oracle Enterprise stage options .
I see that 800 rows per sec are being processed.The server was giving around 2000 rows per/sec.Is it a good option to run it as a parallel job or server.

Thanks

Posted: Thu Oct 09, 2008 12:51 pm
by shamshad
Sheema,

Are you inserting and updating data in target in 1 job?

Posted: Thu Oct 09, 2008 12:57 pm
by sheema
I am doing both insert and update as well.
For insert i am using write method:Load and for update the Write Method is upsert.

Thanks

Posted: Thu Oct 09, 2008 1:06 pm
by shamshad
What I have done in the past is seperate out inserts and updates. This is more of a performance tuning issue and how to handle large amount of data in already populated table which should handle insert and updates.

First I separate inserts and bulk load them to target and then via another job run the updates. If performance is still not acceptable, I update the table first and then if there are indexes/constraints that can be disabled/dropped, I disable/drop them. Then insert all rows via bulk load and then enable constraint with no validate option.

Unfortunately, in most of the case there is no standard way of speeding up the process because each table is different with different constraint and indexes etc.

Posted: Thu Oct 09, 2008 2:16 pm
by ray.wurlod
And stop thinking that row/sec is a meaningful metric of "performance" in ETL. It isn't.

Posted: Thu Oct 09, 2008 2:58 pm
by sheema
I have split up my inserts and updates into 2 jobs
My Job design is as below

OracleEnterprise1----->Trnsfr------->Trnsfr1------->Oracle Enterprise2

In the stage OracleEnterprise1 i extract data from a oracle table, In Trnsfr I do Trim 3 or 4 columns and add a couple of new columns then map them to Trnsfr1 and load into the Oracle table using Oracle Enterprise2 stage(in job1) and update into oracle(in job2).

The no of updates are more than the inserts into the table.

Is there any way i can speed up the process.

Thanks

Posted: Thu Oct 09, 2008 6:35 pm
by Ultramundane
sheema wrote:I have split up my inserts and updates into 2 jobs
My Job design is as below

OracleEnterprise1----->Trnsfr------->Trnsfr1------->Oracle Enterprise2

In the stage OracleEnterprise1 i extract data from a oracle table, In Trnsfr I do Trim 3 or 4 columns and add a couple of new columns then map them to Trnsfr1 and load into the Oracle table using Oracle Enterprise2 stage(in job1) and update into oracle(in job2).

The no of updates are more than the inserts into the table.

Is there any way i can speed up the process.

Thanks
Truncate and Bulk load a table with similar schema and then use the Oracle MERGE command to populate the production table. When the rows are "WHEN MATCHED THEN" on the primary key, update the existing rows and "WHEN NOT MATCHED THEN" insert into the production table.

Posted: Fri Oct 10, 2008 7:34 am
by sheema
Thanks,
How is the Merge command supposed to run. Will it be run manually by DBA to load the table.


Thanks

Posted: Fri Oct 10, 2008 7:40 am
by Ultramundane
sheema wrote:Thanks,
How is the Merge command supposed to run. Will it be run manually by DBA to load the table.


Thanks
Define manually? For instance, you could write your own shell script and use the external source stage to invoke the MERGE command and capture the return code and feed the return code back to datastage and any results output to standard output can be sent down a link and into a sequential file. You could use a single varcharacter column to capture the results and no delimeter. Then, you can still invoke this job through a datastage sequencer if that is a must.