Huge Number of Colums -- Performance Issue

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
DS_SUPPORT
Premium Member
Premium Member
Posts: 232
Joined: Fri Aug 04, 2006 1:20 am
Location: Bangalore

Huge Number of Colums -- Performance Issue

Post by DS_SUPPORT »

My Source is Oracle and my target is also Oracle, but both are in different servers.

In one of the source table, we have around 200 Columns and the data is also huge (around 7 Million). We need to extract the data incrementally, and we dont have timestamp column to identify the newly inserted or modified rows. So we deploy CRC logic to identify the new or modified rows.

As the number of columns and the data is huge, identifying the incremental rows taking so much time.

I did some analysis, and found , we dont have problem while writing to the target, and it is only when fetching from the Source.

Irrespective of Array Size, the number of rows fetched are very less. Now i am working on splitting the job into multiple instances. But it is failing with unknown reasons in some instances, like Communication failure.

Please provide some inputs on tuning the job.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Huge number of ANYTHING = performance issue.

It's supply and demand again.

Do you really need all those columns?

Have you investigated IBM's CDC for Oracle?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DS_SUPPORT
Premium Member
Premium Member
Posts: 232
Joined: Fri Aug 04, 2006 1:20 am
Location: Bangalore

Post by DS_SUPPORT »

But IBM's CDC license will Cost seperately, right? i need to enhance the job with the server edition alone.

Yes, I need to bring all the Coulmns from Source to target.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Then you are stuck with processing the huge number of columns.

C'est la vie.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
swapnilverma
Participant
Posts: 135
Joined: Tue Aug 14, 2007 4:27 am
Location: Mumbai

Post by swapnilverma »

Hi

To identify new recs you can use minus query between the tables

select field1, field2, . field_n
from tables
MINUS
select field1, field2, . field_n
from tables;

for this u must have Key columns.

NOTE
From this you will get the NEW( INSERT) data and you will not get any Update Data.


Having 200colums and 7M recs processing ... I would say Its time for a redesining of ur job.

Add Load_dt and Update_dt columns in ur tables and process update and insert records separately.

Hope this help.
Thanks
Swapnil

"Whenever you find whole world against you just turn around and Lead the world"
Post Reply