ORACLE9i issue

rcil · Post by **rcil** » Thu Oct 28, 2004 9:16 am

Hello All,

I have a select query with a simple where clause which retrives 40 million rows out of 88 million. In DataStage I am using ORACLE9i stage to extract the data into a flat file. This job takes 5 hours to complete. Is there anything should be taken care of for the faster retrieval.

I heard that, if we can add an index to the columns which we using in the where clause will improve the performance. If it is true and that is the only option then as a developer can we do it or do we have to request the DBA to add the index.

Hope I will get helped on this issue.

thanks

ogmios · Post by **ogmios** » Thu Oct 28, 2004 9:38 am

I doubt whether an index would improve your situation if you have to extract about half of the table, indexes are good in Oracle when you need to extract to up about 15% of the table, else table scans are usually faster (and this is only a rule of thumb, not hard statistics).

But that's probably for your DBA's to test.

Anyway 40 million rows in 5 hours is about 2222 rows per second, which is pretty good for DataStage

Ogmios

kcbland · Post by **kcbland** » Thu Oct 28, 2004 10:00 am

You need to add a ranging where clause and run multiple instances of your job, that way you have multiple connections to the table each extracting a portion of the data.

rcil · Post by **rcil** » Thu Oct 28, 2004 10:17 am

Thanks for the quick responses. I have 110 columns in the table and the where clause what I have is
((COLUMN4 >= '200211') or (COLUMN4 is NULL)).

If I run the job in multiple instances will it improve the speed or do you think I have to add anything else on the where clause.

thanks

kcbland · Post by **kcbland** » Thu Oct 28, 2004 10:38 am

You're not improving the speed. What you will be doing is distributing the effort across multiple identical processes, each handling a specific set of the data. This concept is called "partitioned parallelism". You have partitioned the data, and by running multiple identical process simultaneously (parallelism), achieved a result in less time.

You have not sped things up!!! You have just finished quicker.

denzilsyb · Post by **denzilsyb** » Thu Oct 28, 2004 11:55 am

ogmios wrote: Anyway 40 million rows in 5 hours is about 2222 rows per second, which is pretty good for DataStage

But if they were using a real data warehouse database, this value would be much higher.

Sybase IQ.

"and he will persist until all are converted"

kcbland · Post by **kcbland** » Thu Oct 28, 2004 11:57 am

Sorry You Bought A Slow Engine

denzilsyb · Post by **denzilsyb** » Thu Oct 28, 2004 12:02 pm

kcbland wrote: Sorry You Bought A Slow Engine

:D
that has to be the first time I saw that! But luckily we have the stats to prove this is not the case.

rcil · Post by **rcil** » Thu Oct 28, 2004 12:03 pm

I ran the extract job checking the "allow multiple instance" in the job properties in our devlopment,where we have around 4 million records. I compared the timings with the previous run, this job took 2 min more than the previous one.

Is there any other way of running the job in multiple instance other than checking the check box in the job properties. Will it harm anything if we check multiple instances for all the jobs what we have whether it really need it or not?

thanks

kduke · Post by **kduke** » Thu Oct 28, 2004 12:23 pm

There is some overhead for multiple instance jobs especially in the log files and status. So it does hurt performance. Some jobs will not run in a multiple instance. If you have a job which clears a hash file then a second job will erase some or all of the first jobs results depending on if it runs after or at the same time as the other instance.

ogmios · Post by **ogmios** » Thu Oct 28, 2004 12:29 pm

rcil, I don't think you get the idea of Kenneth... he suggests to split your processing in more jobs which each extract a piece of data. Instead of one process extracting the data after a certain date, you could have e.g. 4 processes each extracting 4 parts of your data based on a date range.

But this is not automatically done for you when you switch "allow multiple instances". This is something you would have to create yourself: "allow multiple instance", change your job to have a begin and end range and then start it up multiple times with different arguments at the same time.

And afterwards you have to put the output together also, e.g. by executing a shellscript to cat the output files together.

Ogmios

ray.wurlod · Post by **ray.wurlod** » Thu Oct 28, 2004 3:31 pm

Of course, a database unload might be the fastest possibility. I don't know Oracle well enough to say whether conditional unloading is possible. It certainly is in

Red Brick

- the unload is performed at a physical level and is mega-fast!

Red Brick is a trademark of International Business Machines Corporation.

kcbland · Post by **kcbland** » Thu Oct 28, 2004 10:09 pm

Horrorcle needs a bulk-unloader. Visit http://asktom.oracle.com and get his code for a pro*C bulk-unloader. Or, shell out some bucks and buy CoSorts multi-threaded bulk-unloader for Horrorcle. Another option is to upgrade to PX, which has similar multi-processing capabilities to get data out quickly. Otherwise, it's the "roll-your-own" technique as I described.

badri · Post by **badri** » Mon Nov 01, 2004 4:20 am

Hi,

Indexing will solve to some extend.

-

hedberg02 · Post by **hedberg02** » Mon Nov 01, 2004 6:58 am

May I ask a simple question: What is the target environment ???

Is it a:
1) Another Oracle table in the same database/server or something close to thiss ???
2) A flat file to be processed by another software, and if so what software (and what's is it doing) ???

Index will not work, since the number of rows retreived is to many (as already mentioned).

What degree is the parallell on ???

Fastest solution is unload with exception, or unload + external program to remove rows that are not interesting (if not #1 above, because then I would do INSERT SELECT instead).

DSXchange

ORACLE9i issue

ORACLE9i issue

Re: ORACLE9i issue

Re: ORACLE9i issue