ORACLE9i issue

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

rcil
Charter Member
Charter Member
Posts: 70
Joined: Sat Jun 05, 2004 1:37 am

ORACLE9i issue

Post by rcil »

Hello All,

I have a select query with a simple where clause which retrives 40 million rows out of 88 million. In DataStage I am using ORACLE9i stage to extract the data into a flat file. This job takes 5 hours to complete. Is there anything should be taken care of for the faster retrieval.

I heard that, if we can add an index to the columns which we using in the where clause will improve the performance. If it is true and that is the only option then as a developer can we do it or do we have to request the DBA to add the index.

Hope I will get helped on this issue.

thanks
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: ORACLE9i issue

Post by ogmios »

I doubt whether an index would improve your situation if you have to extract about half of the table, indexes are good in Oracle when you need to extract to up about 15% of the table, else table scans are usually faster (and this is only a rule of thumb, not hard statistics).

But that's probably for your DBA's to test.

Anyway 40 million rows in 5 hours is about 2222 rows per second, which is pretty good for DataStage :wink:

Ogmios
In theory there's no difference between theory and practice. In practice there is.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You need to add a ranging where clause and run multiple instances of your job, that way you have multiple connections to the table each extracting a portion of the data.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
rcil
Charter Member
Charter Member
Posts: 70
Joined: Sat Jun 05, 2004 1:37 am

Post by rcil »

Thanks for the quick responses. I have 110 columns in the table and the where clause what I have is
((COLUMN4 >= '200211') or (COLUMN4 is NULL)).

If I run the job in multiple instances will it improve the speed or do you think I have to add anything else on the where clause.

thanks
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You're not improving the speed. What you will be doing is distributing the effort across multiple identical processes, each handling a specific set of the data. This concept is called "partitioned parallelism". You have partitioned the data, and by running multiple identical process simultaneously (parallelism), achieved a result in less time.

You have not sped things up!!! You have just finished quicker.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
denzilsyb
Participant
Posts: 186
Joined: Mon Sep 22, 2003 7:38 am
Location: South Africa
Contact:

Re: ORACLE9i issue

Post by denzilsyb »

ogmios wrote: Anyway 40 million rows in 5 hours is about 2222 rows per second, which is pretty good for DataStage :wink:
But if they were using a real data warehouse database, this value would be much higher. :idea: Sybase IQ. :idea: "and he will persist until all are converted"
dnzl
"what the thinker thinks, the prover proves" - Robert Anton Wilson
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Sorry You Bought A Slow Engine
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
denzilsyb
Participant
Posts: 186
Joined: Mon Sep 22, 2003 7:38 am
Location: South Africa
Contact:

Post by denzilsyb »

kcbland wrote: Sorry You Bought A Slow Engine
:D
that has to be the first time I saw that! But luckily we have the stats to prove this is not the case.
dnzl
"what the thinker thinks, the prover proves" - Robert Anton Wilson
rcil
Charter Member
Charter Member
Posts: 70
Joined: Sat Jun 05, 2004 1:37 am

Post by rcil »

I ran the extract job checking the "allow multiple instance" in the job properties in our devlopment,where we have around 4 million records. I compared the timings with the previous run, this job took 2 min more than the previous one.

Is there any other way of running the job in multiple instance other than checking the check box in the job properties. Will it harm anything if we check multiple instances for all the jobs what we have whether it really need it or not?


thanks
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

There is some overhead for multiple instance jobs especially in the log files and status. So it does hurt performance. Some jobs will not run in a multiple instance. If you have a job which clears a hash file then a second job will erase some or all of the first jobs results depending on if it runs after or at the same time as the other instance.
Mamu Kim
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Post by ogmios »

rcil, I don't think you get the idea of Kenneth... he suggests to split your processing in more jobs which each extract a piece of data. Instead of one process extracting the data after a certain date, you could have e.g. 4 processes each extracting 4 parts of your data based on a date range.

But this is not automatically done for you when you switch "allow multiple instances". This is something you would have to create yourself: "allow multiple instance", change your job to have a begin and end range and then start it up multiple times with different arguments at the same time.

And afterwards you have to put the output together also, e.g. by executing a shellscript to cat the output files together.

Ogmios
In theory there's no difference between theory and practice. In practice there is.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Of course, a database unload might be the fastest possibility. I don't know Oracle well enough to say whether conditional unloading is possible. It certainly is in :idea: Red Brick :idea: - the unload is performed at a physical level and is mega-fast!

Red Brick is a trademark of International Business Machines Corporation.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Horrorcle needs a bulk-unloader. Visit http://asktom.oracle.com and get his code for a pro*C bulk-unloader. Or, shell out some bucks and buy CoSorts multi-threaded bulk-unloader for Horrorcle. Another option is to upgrade to PX, which has similar multi-processing capabilities to get data out quickly. Otherwise, it's the "roll-your-own" technique as I described.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
badri
Participant
Posts: 19
Joined: Mon Jul 12, 2004 2:58 am

Post by badri »

Hi,

Indexing will solve to some extend.

- :wink:
hedberg02
Participant
Posts: 3
Joined: Thu Sep 09, 2004 8:19 am
Location: Stockholm / Sweden
Contact:

Post by hedberg02 »

May I ask a simple question: What is the target environment ???

Is it a:
1) Another Oracle table in the same database/server or something close to thiss ???
2) A flat file to be processed by another software, and if so what software (and what's is it doing) ???

Index will not work, since the number of rows retreived is to many (as already mentioned).

What degree is the parallell on ???

Fastest solution is unload with exception, or unload + external program to remove rows that are not interesting (if not #1 above, because then I would do INSERT SELECT instead).
Best Regards
Lars Hedberg

I'm a Rock Climber on my free time ;)
Post Reply