Poor performance - Extracting Oracle Table

anu123 · Post by **anu123** » Thu May 30, 2013 10:36 pm

One of our jobs is running for ever to extract from an oracle table which has nearly 10 bil rows. We are trying to do an initial load to target i.e. Netezza, but it is running for ever. So, We tried to pull for a date range of 3 months or so, the job ran for 10 hrs to pull 300 mill. row.

It is very large table. I can not ask source systems to partition or index the table.

Are there any setting in Oracle connector or project level to make this run faster? Or any other extraction strategy for initial load?

BTW, our source is Oracle and target is Netezza.

Thanks in advance.

prasannakumarkk · Post by **prasannakumarkk** » Thu May 30, 2013 11:06 pm

Are there any intermediate stages between source and target.
How you are mentioning extraction is consuming time?
Instead of Netezza as target keep a dataset as target and run the job. Then you can narrow down where we have to dig the problem of time consumption

ray.wurlod · Post by **ray.wurlod** » Thu May 30, 2013 11:09 pm

Is it any faster using an Oracle client (e.g. TOAD) from the DataStage server?

If not, you may have network issues that need to be addressed.

Note, too, that it's not the time for the first row to appear; it's the time for all rows to arrive that you're interested in.

chulett · Post by **chulett** » Thu May 30, 2013 11:16 pm

Without an index over your filter criteria (whatever date that is) all you've got is a full table scan.

kwwilliams · Post by **kwwilliams** » Fri May 31, 2013 10:13 am

How many bytes are in each row? A wide 10 billion row table is going to take quite some time to extract. A fairly skinnty 10 billion row table should be manageable.

Have you tried to perform a parallel read? In the connector, enable parallel read and then set rowid hash and the table name equal to your table name. It may take a while to start up as it is going to open a connection for every node in your apt_config_file and pull based upon the row id. This should be faster.

anu123 · Post by **anu123** » Fri May 31, 2013 1:39 pm

Prasanna,
No stages in between.
we tried that approach and found better performance.

Craig,

I think there are some network issues as well. Source, Target and DS Servers are located at different locations with limited bandwidth.

Indexing, I am not it's a good idea to ask them to create indexes on such tables.

kwWilliams, each row is pretty wide with more 100 columns and some bigint, varchar 50-80 etc.

Thanks,
Anu.

chulett · Post by **chulett** » Fri May 31, 2013 1:45 pm

anu123 wrote:Indexing, I am not it's a good idea to ask them to create indexes on such tables.

Well... then you are stuck doing full table scans to find the rows you want to extract (even doing it 'parallel') and you will have to live with the performance of that.

myukassign · Post by **myukassign** » Sat Jun 01, 2013 12:24 am

Hey... You might need to think a bit differntly in this case. As you are looking for a one time load, you might need to go for a full dump in some sort of file structure from the DBA. DBA has privilage to get the direct dump from the database and take a cut off date for the dump.

Zip the file and drop in DS server and do a plain load. Then you run daily jobs from the point of cut off.

chulett · Post by **chulett** » Sat Jun 01, 2013 8:50 am

Bottom line (IMHO) you should be discussing this with your DBA rather than you trying to decide what to even ask them. I'm sure they're aware of the table structure and volume, so explain your needs and see what they recommend. Don't focus on DataStage, just the process as the tool being used isn't a consideration here.

Important to differentiate with them if this is a one-time process or something you'll need to do on an on-going basis. You said 'initial load' so I'm assuming there will be incremental extracts after that, without something in the way of indexing or partitions to help with that you are going to be seeing this same 'poor performance' every time.

anu123 · Post by **anu123** » Mon Jun 03, 2013 5:50 am

Thank you all for the inputs.

planning to meet DBA group to discuss further on this.

Craig, you are correct. If the source table is not indexed then probably I see same issue even with delta pulls.

Like I said, we are planning to meet DBAs and will keep you posted.

Thanks again.