ORACLE9i issue
Moderators: chulett, rschirm, roy
ORACLE9i issue
Hello All,
I have a select query with a simple where clause which retrives 40 million rows out of 88 million. In DataStage I am using ORACLE9i stage to extract the data into a flat file. This job takes 5 hours to complete. Is there anything should be taken care of for the faster retrieval.
I heard that, if we can add an index to the columns which we using in the where clause will improve the performance. If it is true and that is the only option then as a developer can we do it or do we have to request the DBA to add the index.
Hope I will get helped on this issue.
thanks
I have a select query with a simple where clause which retrives 40 million rows out of 88 million. In DataStage I am using ORACLE9i stage to extract the data into a flat file. This job takes 5 hours to complete. Is there anything should be taken care of for the faster retrieval.
I heard that, if we can add an index to the columns which we using in the where clause will improve the performance. If it is true and that is the only option then as a developer can we do it or do we have to request the DBA to add the index.
Hope I will get helped on this issue.
thanks
Re: ORACLE9i issue
I doubt whether an index would improve your situation if you have to extract about half of the table, indexes are good in Oracle when you need to extract to up about 15% of the table, else table scans are usually faster (and this is only a rule of thumb, not hard statistics).
But that's probably for your DBA's to test.
Anyway 40 million rows in 5 hours is about 2222 rows per second, which is pretty good for DataStage
Ogmios
But that's probably for your DBA's to test.
Anyway 40 million rows in 5 hours is about 2222 rows per second, which is pretty good for DataStage
Ogmios
In theory there's no difference between theory and practice. In practice there is.
You need to add a ranging where clause and run multiple instances of your job, that way you have multiple connections to the table each extracting a portion of the data.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
You're not improving the speed. What you will be doing is distributing the effort across multiple identical processes, each handling a specific set of the data. This concept is called "partitioned parallelism". You have partitioned the data, and by running multiple identical process simultaneously (parallelism), achieved a result in less time.
You have not sped things up!!! You have just finished quicker.
You have not sped things up!!! You have just finished quicker.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Re: ORACLE9i issue
But if they were using a real data warehouse database, this value would be much higher. Sybase IQ. "and he will persist until all are converted"ogmios wrote: Anyway 40 million rows in 5 hours is about 2222 rows per second, which is pretty good for DataStage
dnzl
"what the thinker thinks, the prover proves" - Robert Anton Wilson
"what the thinker thinks, the prover proves" - Robert Anton Wilson
I ran the extract job checking the "allow multiple instance" in the job properties in our devlopment,where we have around 4 million records. I compared the timings with the previous run, this job took 2 min more than the previous one.
Is there any other way of running the job in multiple instance other than checking the check box in the job properties. Will it harm anything if we check multiple instances for all the jobs what we have whether it really need it or not?
thanks
Is there any other way of running the job in multiple instance other than checking the check box in the job properties. Will it harm anything if we check multiple instances for all the jobs what we have whether it really need it or not?
thanks
There is some overhead for multiple instance jobs especially in the log files and status. So it does hurt performance. Some jobs will not run in a multiple instance. If you have a job which clears a hash file then a second job will erase some or all of the first jobs results depending on if it runs after or at the same time as the other instance.
Mamu Kim
rcil, I don't think you get the idea of Kenneth... he suggests to split your processing in more jobs which each extract a piece of data. Instead of one process extracting the data after a certain date, you could have e.g. 4 processes each extracting 4 parts of your data based on a date range.
But this is not automatically done for you when you switch "allow multiple instances". This is something you would have to create yourself: "allow multiple instance", change your job to have a begin and end range and then start it up multiple times with different arguments at the same time.
And afterwards you have to put the output together also, e.g. by executing a shellscript to cat the output files together.
Ogmios
But this is not automatically done for you when you switch "allow multiple instances". This is something you would have to create yourself: "allow multiple instance", change your job to have a begin and end range and then start it up multiple times with different arguments at the same time.
And afterwards you have to put the output together also, e.g. by executing a shellscript to cat the output files together.
Ogmios
In theory there's no difference between theory and practice. In practice there is.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Of course, a database unload might be the fastest possibility. I don't know Oracle well enough to say whether conditional unloading is possible. It certainly is in Red Brick - the unload is performed at a physical level and is mega-fast!
Red Brick is a trademark of International Business Machines Corporation.
Red Brick is a trademark of International Business Machines Corporation.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Horrorcle needs a bulk-unloader. Visit http://asktom.oracle.com and get his code for a pro*C bulk-unloader. Or, shell out some bucks and buy CoSorts multi-threaded bulk-unloader for Horrorcle. Another option is to upgrade to PX, which has similar multi-processing capabilities to get data out quickly. Otherwise, it's the "roll-your-own" technique as I described.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 3
- Joined: Thu Sep 09, 2004 8:19 am
- Location: Stockholm / Sweden
- Contact:
May I ask a simple question: What is the target environment ???
Is it a:
1) Another Oracle table in the same database/server or something close to thiss ???
2) A flat file to be processed by another software, and if so what software (and what's is it doing) ???
Index will not work, since the number of rows retreived is to many (as already mentioned).
What degree is the parallell on ???
Fastest solution is unload with exception, or unload + external program to remove rows that are not interesting (if not #1 above, because then I would do INSERT SELECT instead).
Is it a:
1) Another Oracle table in the same database/server or something close to thiss ???
2) A flat file to be processed by another software, and if so what software (and what's is it doing) ???
Index will not work, since the number of rows retreived is to many (as already mentioned).
What degree is the parallell on ???
Fastest solution is unload with exception, or unload + external program to remove rows that are not interesting (if not #1 above, because then I would do INSERT SELECT instead).
Best Regards
Lars Hedberg
I'm a Rock Climber on my free time
Lars Hedberg
I'm a Rock Climber on my free time