Page 1 of 1

Performance in Extraction

Posted: Wed Jun 21, 2006 8:34 am
by yiminghu
Hi,

I have a very simply ds job. It has one Oracle stage and one Sequential file stage. I was trying to extract records from Oracle server and write to sequential file, since there is no transformation needed, I omitted the transform stage.

The problem I am getting is that when I ran the query on Oracle client(Toad), the result could come back under 1 minute. When I was running it on datastage, the row/second is 5. The datastage server and oracle server are on the same box, and the system is not very busy. I could not figure why it is running so slow in datastage?

Can somebody give me an idea how to improve the performance?

Thanks,

Carol

Posted: Wed Jun 21, 2006 8:37 am
by kris007
What did you set your Array size to, in the OCI Output Tab? Increase it and see if it helps.

Posted: Wed Jun 21, 2006 9:17 am
by DSguru2B
How many rows are you extracting? Do you have user defined sql?
Are you performing any functions in the user defined sql?

Posted: Wed Jun 21, 2006 9:35 am
by yiminghu
kris007 wrote:What did you set your Array size to, in the OCI Output Tab? Increase it and see if it helps.
My initial size is 1000, later, I increase to 10000. The speed is 30 row/second now. Still not good.

Posted: Wed Jun 21, 2006 9:36 am
by yiminghu
DSguru2B wrote:How many rows are you extracting? Do you have user defined sql?
Are you performing any functions in the user defined sql?
The number of row I'm extracting is around 300,000. It is used defined sql, but no function is performed in user defined sql.

Posted: Wed Jun 21, 2006 9:57 am
by DSguru2B
if no function is performed on user defined SQL, why use it. Simple just use the generated sql and then see the difference in performance.

Posted: Fri Jun 23, 2006 8:00 am
by yiminghu
DSguru2B wrote:if no function is performed on user defined SQL, why use it. Simple just use the generated sql and then see the difference in performance.
Tried that. Still does not work. And I also notice a very strange thing. If I don't specify order by in my query, the datastage job starting to write to sequential very soon after I start my job, but at very low speed. If I specify order, it hangs there for almost 3 hours then started to write to sequential file, but at very high speed. Don't know what's the reason.

Posted: Fri Jun 23, 2006 8:07 am
by kumar_s
Its all because, it orders it collect it in buffer, and hence takes time to consolidate the whole set, and have a quick write.
What does the total time taken for the order by in th query and with out it?

Posted: Fri Jun 23, 2006 8:09 am
by kris007
What is the size of your temp DB? Is it large enough? Also, the speed with which the job writes to the sequential file depends on the number of columns and average length of each record you are processing apart from the number of rows in the source file. The rows/sec in the director client is relative and changes based upon these statistics.

HTH

Posted: Fri Jun 23, 2006 8:36 am
by kcbland
It's the total time to get the LAST ROW that matters. If you do SELECT * from a table, it is no indication that Toad can extract 100M rows in 1 second, it just means that's how long it takes to find the first row to output. In DataStage, you're seeing the time it takes to find the rows and send them across the network.

Just because Toad returns rows quickly doesn't mean that Toad is faster. By putting an ORDER BY into the query, the database has to first go get all of the rows it needs. I'm sure if you timed Toad and DataStage you would see that they both get te result set in the same amount of time, the remaining time is simply writing the rows to the table.

Posted: Fri Jun 23, 2006 3:59 pm
by ray.wurlod
Without ORDER BY the database server can deliver the first row as soon as it finds it.

With ORDER BY the database server can not deliver any row until it has processed all rows to determine which is the first in sorted order.

DataStage's clock is running the whole time that the sorting is being performed, even though no rows are being forwarded from the database, which is why rows/sec seems to be slow for the job.

Re: Performance in Extraction

Posted: Sun Jun 25, 2006 8:40 pm
by Chuah
yiminghu wrote:Hi,

I have a very simply ds job. It has one Oracle stage and one Sequential file stage. I was trying to extract records from Oracle server and write to sequential file, since there is no transformation needed, I omitted the transform stage.

The problem I am getting is that when I ran the query on Oracle client(Toad), the result could come back under 1 minute. When I was running it on datastage, the row/second is 5. The datastage server and oracle server are on the same box, and the system is not very busy. I could not figure why it is running so slow in datastage?

Can somebody give me an idea how to improve the performance?

Thanks,

Carol
TOAD by default returns the first 100 rows it finds whereas when you run the job it tries to return the entire set.

Posted: Sun Jun 25, 2006 10:34 pm
by ray.wurlod
Try setting array size to 100 (the same as TOAD) or even 1.