Performance in Extraction
Moderators: chulett, rschirm, roy
Performance in Extraction
Hi,
I have a very simply ds job. It has one Oracle stage and one Sequential file stage. I was trying to extract records from Oracle server and write to sequential file, since there is no transformation needed, I omitted the transform stage.
The problem I am getting is that when I ran the query on Oracle client(Toad), the result could come back under 1 minute. When I was running it on datastage, the row/second is 5. The datastage server and oracle server are on the same box, and the system is not very busy. I could not figure why it is running so slow in datastage?
Can somebody give me an idea how to improve the performance?
Thanks,
Carol
I have a very simply ds job. It has one Oracle stage and one Sequential file stage. I was trying to extract records from Oracle server and write to sequential file, since there is no transformation needed, I omitted the transform stage.
The problem I am getting is that when I ran the query on Oracle client(Toad), the result could come back under 1 minute. When I was running it on datastage, the row/second is 5. The datastage server and oracle server are on the same box, and the system is not very busy. I could not figure why it is running so slow in datastage?
Can somebody give me an idea how to improve the performance?
Thanks,
Carol
Tried that. Still does not work. And I also notice a very strange thing. If I don't specify order by in my query, the datastage job starting to write to sequential very soon after I start my job, but at very low speed. If I specify order, it hangs there for almost 3 hours then started to write to sequential file, but at very high speed. Don't know what's the reason.DSguru2B wrote:if no function is performed on user defined SQL, why use it. Simple just use the generated sql and then see the difference in performance.
What is the size of your temp DB? Is it large enough? Also, the speed with which the job writes to the sequential file depends on the number of columns and average length of each record you are processing apart from the number of rows in the source file. The rows/sec in the director client is relative and changes based upon these statistics.
HTH
HTH
Kris
Where's the "Any" key?-Homer Simpson
Where's the "Any" key?-Homer Simpson
It's the total time to get the LAST ROW that matters. If you do SELECT * from a table, it is no indication that Toad can extract 100M rows in 1 second, it just means that's how long it takes to find the first row to output. In DataStage, you're seeing the time it takes to find the rows and send them across the network.
Just because Toad returns rows quickly doesn't mean that Toad is faster. By putting an ORDER BY into the query, the database has to first go get all of the rows it needs. I'm sure if you timed Toad and DataStage you would see that they both get te result set in the same amount of time, the remaining time is simply writing the rows to the table.
Just because Toad returns rows quickly doesn't mean that Toad is faster. By putting an ORDER BY into the query, the database has to first go get all of the rows it needs. I'm sure if you timed Toad and DataStage you would see that they both get te result set in the same amount of time, the remaining time is simply writing the rows to the table.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Without ORDER BY the database server can deliver the first row as soon as it finds it.
With ORDER BY the database server can not deliver any row until it has processed all rows to determine which is the first in sorted order.
DataStage's clock is running the whole time that the sorting is being performed, even though no rows are being forwarded from the database, which is why rows/sec seems to be slow for the job.
With ORDER BY the database server can not deliver any row until it has processed all rows to determine which is the first in sorted order.
DataStage's clock is running the whole time that the sorting is being performed, even though no rows are being forwarded from the database, which is why rows/sec seems to be slow for the job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Re: Performance in Extraction
TOAD by default returns the first 100 rows it finds whereas when you run the job it tries to return the entire set.yiminghu wrote:Hi,
I have a very simply ds job. It has one Oracle stage and one Sequential file stage. I was trying to extract records from Oracle server and write to sequential file, since there is no transformation needed, I omitted the transform stage.
The problem I am getting is that when I ran the query on Oracle client(Toad), the result could come back under 1 minute. When I was running it on datastage, the row/second is 5. The datastage server and oracle server are on the same box, and the system is not very busy. I could not figure why it is running so slow in datastage?
Can somebody give me an idea how to improve the performance?
Thanks,
Carol
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: