Huge Number of Colums -- Performance Issue

DS_SUPPORT · Post by **DS_SUPPORT** » Mon Jan 19, 2009 11:53 pm

My Source is Oracle and my target is also Oracle, but both are in different servers.

In one of the source table, we have around 200 Columns and the data is also huge (around 7 Million). We need to extract the data incrementally, and we dont have timestamp column to identify the newly inserted or modified rows. So we deploy CRC logic to identify the new or modified rows.

As the number of columns and the data is huge, identifying the incremental rows taking so much time.

I did some analysis, and found , we dont have problem while writing to the target, and it is only when fetching from the Source.

Irrespective of Array Size, the number of rows fetched are very less. Now i am working on splitting the job into multiple instances. But it is failing with unknown reasons in some instances, like Communication failure.

Please provide some inputs on tuning the job.

ray.wurlod · Post by **ray.wurlod** » Tue Jan 20, 2009 12:05 am

Huge number of ANYTHING = performance issue.

It's supply and demand again.

Do you really need all those columns?

Have you investigated IBM's CDC for Oracle?

DS_SUPPORT · Post by **DS_SUPPORT** » Tue Jan 20, 2009 12:52 am

But IBM's CDC license will Cost seperately, right? i need to enhance the job with the server edition alone.

Yes, I need to bring all the Coulmns from Source to target.

ray.wurlod · Post by **ray.wurlod** » Tue Jan 20, 2009 4:43 am

Then you are stuck with processing the huge number of columns.

C'est la vie.

swapnilverma · Post by **swapnilverma** » Tue Jan 20, 2009 6:16 am

Hi

To identify new recs you can use minus query between the tables

select field1, field2, . field_n
from tables
MINUS
select field1, field2, . field_n
from tables;

for this u must have Key columns.

NOTE
From this you will get the NEW( INSERT) data and you will not get any Update Data.

Having 200colums and 7M recs processing ... I would say Its time for a redesining of ur job.

Add Load_dt and Update_dt columns in ur tables and process update and insert records separately.

Hope this help.