Performance in Extraction

yiminghu · Post by **yiminghu** » Wed Jun 21, 2006 8:34 am

Hi,

I have a very simply ds job. It has one Oracle stage and one Sequential file stage. I was trying to extract records from Oracle server and write to sequential file, since there is no transformation needed, I omitted the transform stage.

The problem I am getting is that when I ran the query on Oracle client(Toad), the result could come back under 1 minute. When I was running it on datastage, the row/second is 5. The datastage server and oracle server are on the same box, and the system is not very busy. I could not figure why it is running so slow in datastage?

Can somebody give me an idea how to improve the performance?

Thanks,

Carol

kris007 · Post by **kris007** » Wed Jun 21, 2006 8:37 am

What did you set your Array size to, in the OCI Output Tab? Increase it and see if it helps.

DSguru2B · Post by **DSguru2B** » Wed Jun 21, 2006 9:17 am

How many rows are you extracting? Do you have user defined sql?
Are you performing any functions in the user defined sql?

yiminghu · Post by **yiminghu** » Wed Jun 21, 2006 9:35 am

kris007 wrote:What did you set your Array size to, in the OCI Output Tab? Increase it and see if it helps.

My initial size is 1000, later, I increase to 10000. The speed is 30 row/second now. Still not good.

yiminghu · Post by **yiminghu** » Wed Jun 21, 2006 9:36 am

DSguru2B wrote:How many rows are you extracting? Do you have user defined sql?
Are you performing any functions in the user defined sql?

The number of row I'm extracting is around 300,000. It is used defined sql, but no function is performed in user defined sql.

DSguru2B · Post by **DSguru2B** » Wed Jun 21, 2006 9:57 am

if no function is performed on user defined SQL, why use it. Simple just use the generated sql and then see the difference in performance.

yiminghu · Post by **yiminghu** » Fri Jun 23, 2006 8:00 am

DSguru2B wrote:if no function is performed on user defined SQL, why use it. Simple just use the generated sql and then see the difference in performance.

Tried that. Still does not work. And I also notice a very strange thing. If I don't specify order by in my query, the datastage job starting to write to sequential very soon after I start my job, but at very low speed. If I specify order, it hangs there for almost 3 hours then started to write to sequential file, but at very high speed. Don't know what's the reason.

kumar_s · Post by **kumar_s** » Fri Jun 23, 2006 8:07 am

Its all because, it orders it collect it in buffer, and hence takes time to consolidate the whole set, and have a quick write.
What does the total time taken for the order by in th query and with out it?

kris007 · Post by **kris007** » Fri Jun 23, 2006 8:09 am

What is the size of your temp DB? Is it large enough? Also, the speed with which the job writes to the sequential file depends on the number of columns and average length of each record you are processing apart from the number of rows in the source file. The rows/sec in the director client is relative and changes based upon these statistics.

HTH

kcbland · Post by **kcbland** » Fri Jun 23, 2006 8:36 am

It's the total time to get the LAST ROW that matters. If you do SELECT * from a table, it is no indication that Toad can extract 100M rows in 1 second, it just means that's how long it takes to find the first row to output. In DataStage, you're seeing the time it takes to find the rows and send them across the network.

Just because Toad returns rows quickly doesn't mean that Toad is faster. By putting an ORDER BY into the query, the database has to first go get all of the rows it needs. I'm sure if you timed Toad and DataStage you would see that they both get te result set in the same amount of time, the remaining time is simply writing the rows to the table.

ray.wurlod · Post by **ray.wurlod** » Fri Jun 23, 2006 3:59 pm

Without ORDER BY the database server can deliver the first row as soon as it finds it.

With ORDER BY the database server can not deliver any row until it has processed all rows to determine which is the first in sorted order.

DataStage's clock is running the whole time that the sorting is being performed, even though no rows are being forwarded from the database, which is why rows/sec seems to be slow for the job.

Chuah · Post by **Chuah** » Sun Jun 25, 2006 8:40 pm

yiminghu wrote:Hi,

I have a very simply ds job. It has one Oracle stage and one Sequential file stage. I was trying to extract records from Oracle server and write to sequential file, since there is no transformation needed, I omitted the transform stage.

The problem I am getting is that when I ran the query on Oracle client(Toad), the result could come back under 1 minute. When I was running it on datastage, the row/second is 5. The datastage server and oracle server are on the same box, and the system is not very busy. I could not figure why it is running so slow in datastage?

Can somebody give me an idea how to improve the performance?

Thanks,

Carol

TOAD by default returns the first 100 rows it finds whereas when you run the job it tries to return the entire set.

ray.wurlod · Post by **ray.wurlod** » Sun Jun 25, 2006 10:34 pm

Try setting array size to 100 (the same as TOAD) or even 1.

DSXchange

Performance in Extraction

Performance in Extraction

Re: Performance in Extraction