Performance in Extraction

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
yiminghu
Participant
Posts: 91
Joined: Fri Nov 14, 2003 10:09 am

Performance in Extraction

Post by yiminghu »

Hi,

I have a very simply ds job. It has one Oracle stage and one Sequential file stage. I was trying to extract records from Oracle server and write to sequential file, since there is no transformation needed, I omitted the transform stage.

The problem I am getting is that when I ran the query on Oracle client(Toad), the result could come back under 1 minute. When I was running it on datastage, the row/second is 5. The datastage server and oracle server are on the same box, and the system is not very busy. I could not figure why it is running so slow in datastage?

Can somebody give me an idea how to improve the performance?

Thanks,

Carol
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

What did you set your Array size to, in the OCI Output Tab? Increase it and see if it helps.
Kris

Where's the "Any" key?-Homer Simpson
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

How many rows are you extracting? Do you have user defined sql?
Are you performing any functions in the user defined sql?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
yiminghu
Participant
Posts: 91
Joined: Fri Nov 14, 2003 10:09 am

Post by yiminghu »

kris007 wrote:What did you set your Array size to, in the OCI Output Tab? Increase it and see if it helps.
My initial size is 1000, later, I increase to 10000. The speed is 30 row/second now. Still not good.
yiminghu
Participant
Posts: 91
Joined: Fri Nov 14, 2003 10:09 am

Post by yiminghu »

DSguru2B wrote:How many rows are you extracting? Do you have user defined sql?
Are you performing any functions in the user defined sql?
The number of row I'm extracting is around 300,000. It is used defined sql, but no function is performed in user defined sql.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

if no function is performed on user defined SQL, why use it. Simple just use the generated sql and then see the difference in performance.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
yiminghu
Participant
Posts: 91
Joined: Fri Nov 14, 2003 10:09 am

Post by yiminghu »

DSguru2B wrote:if no function is performed on user defined SQL, why use it. Simple just use the generated sql and then see the difference in performance.
Tried that. Still does not work. And I also notice a very strange thing. If I don't specify order by in my query, the datastage job starting to write to sequential very soon after I start my job, but at very low speed. If I specify order, it hangs there for almost 3 hours then started to write to sequential file, but at very high speed. Don't know what's the reason.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Its all because, it orders it collect it in buffer, and hence takes time to consolidate the whole set, and have a quick write.
What does the total time taken for the order by in th query and with out it?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

What is the size of your temp DB? Is it large enough? Also, the speed with which the job writes to the sequential file depends on the number of columns and average length of each record you are processing apart from the number of rows in the source file. The rows/sec in the director client is relative and changes based upon these statistics.

HTH
Kris

Where's the "Any" key?-Homer Simpson
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

It's the total time to get the LAST ROW that matters. If you do SELECT * from a table, it is no indication that Toad can extract 100M rows in 1 second, it just means that's how long it takes to find the first row to output. In DataStage, you're seeing the time it takes to find the rows and send them across the network.

Just because Toad returns rows quickly doesn't mean that Toad is faster. By putting an ORDER BY into the query, the database has to first go get all of the rows it needs. I'm sure if you timed Toad and DataStage you would see that they both get te result set in the same amount of time, the remaining time is simply writing the rows to the table.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Without ORDER BY the database server can deliver the first row as soon as it finds it.

With ORDER BY the database server can not deliver any row until it has processed all rows to determine which is the first in sorted order.

DataStage's clock is running the whole time that the sorting is being performed, even though no rows are being forwarded from the database, which is why rows/sec seems to be slow for the job.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Chuah
Participant
Posts: 46
Joined: Thu May 18, 2006 9:13 pm
Location: Melbourne

Re: Performance in Extraction

Post by Chuah »

yiminghu wrote:Hi,

I have a very simply ds job. It has one Oracle stage and one Sequential file stage. I was trying to extract records from Oracle server and write to sequential file, since there is no transformation needed, I omitted the transform stage.

The problem I am getting is that when I ran the query on Oracle client(Toad), the result could come back under 1 minute. When I was running it on datastage, the row/second is 5. The datastage server and oracle server are on the same box, and the system is not very busy. I could not figure why it is running so slow in datastage?

Can somebody give me an idea how to improve the performance?

Thanks,

Carol
TOAD by default returns the first 100 rows it finds whereas when you run the job it tries to return the entire set.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Try setting array size to 100 (the same as TOAD) or even 1.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply