Poor performance - Extracting Oracle Table

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Poor performance - Extracting Oracle Table

Post by anu123 »

One of our jobs is running for ever to extract from an oracle table which has nearly 10 bil rows. We are trying to do an initial load to target i.e. Netezza, but it is running for ever. So, We tried to pull for a date range of 3 months or so, the job ran for 10 hrs to pull 300 mill. row.

It is very large table. I can not ask source systems to partition or index the table.

Are there any setting in Oracle connector or project level to make this run faster? Or any other extraction strategy for initial load?

BTW, our source is Oracle and target is Netezza.


Thanks in advance.
Thank you,
Anu
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

Are there any intermediate stages between source and target.
How you are mentioning extraction is consuming time?
Instead of Netezza as target keep a dataset as target and run the job. Then you can narrow down where we have to dig the problem of time consumption
Thanks,
Prasanna
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Is it any faster using an Oracle client (e.g. TOAD) from the DataStage server?

If not, you may have network issues that need to be addressed.

Note, too, that it's not the time for the first row to appear; it's the time for all rows to arrive that you're interested in.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Without an index over your filter criteria (whatever date that is) all you've got is a full table scan.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

How many bytes are in each row? A wide 10 billion row table is going to take quite some time to extract. A fairly skinnty 10 billion row table should be manageable.

Have you tried to perform a parallel read? In the connector, enable parallel read and then set rowid hash and the table name equal to your table name. It may take a while to start up as it is going to open a connection for every node in your apt_config_file and pull based upon the row id. This should be faster.
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

Prasanna,
No stages in between.
we tried that approach and found better performance.

Craig,

I think there are some network issues as well. Source, Target and DS Servers are located at different locations with limited bandwidth.

Indexing, I am not it's a good idea to ask them to create indexes on such tables.


kwWilliams, each row is pretty wide with more 100 columns and some bigint, varchar 50-80 etc.

Thanks,
Anu.
Thank you,
Anu
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

anu123 wrote:Indexing, I am not it's a good idea to ask them to create indexes on such tables.
Well... then you are stuck doing full table scans to find the rows you want to extract (even doing it 'parallel') and you will have to live with the performance of that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
myukassign
Premium Member
Premium Member
Posts: 238
Joined: Fri Jul 25, 2008 8:55 am

Post by myukassign »

Hey... You might need to think a bit differntly in this case. As you are looking for a one time load, you might need to go for a full dump in some sort of file structure from the DBA. DBA has privilage to get the direct dump from the database and take a cut off date for the dump.

Zip the file and drop in DS server and do a plain load. Then you run daily jobs from the point of cut off.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Bottom line (IMHO) you should be discussing this with your DBA rather than you trying to decide what to even ask them. I'm sure they're aware of the table structure and volume, so explain your needs and see what they recommend. Don't focus on DataStage, just the process as the tool being used isn't a consideration here.

Important to differentiate with them if this is a one-time process or something you'll need to do on an on-going basis. You said 'initial load' so I'm assuming there will be incremental extracts after that, without something in the way of indexing or partitions to help with that you are going to be seeing this same 'poor performance' every time.
-craig

"You can never have too many knives" -- Logan Nine Fingers
anu123
Premium Member
Premium Member
Posts: 143
Joined: Sun Feb 05, 2006 1:05 pm
Location: Columbus, OH, USA

Post by anu123 »

Thank you all for the inputs.

planning to meet DBA group to discuss further on this.

Craig, you are correct. If the source table is not indexed then probably I see same issue even with delta pulls.

Like I said, we are planning to meet DBAs and will keep you posted.

Thanks again.
Thank you,
Anu
Post Reply