TOAD vs DataStage

arpitchopra · Post by **arpitchopra** » Fri Nov 12, 2010 7:57 am

Hi everyone,

I am facing a very strange problem and would be extremely grateful if any of you could help me with the solution.

I have a query which performs actions like pivoting up, aggregating (and then merging with the same data set) and finally a join. The 2 tables involved contain 30 million (main table) and 2.5 million (join) records respectively. This query takes close to 15 minutes to run on TOAD and returns 17 million records.
However, when I paste the same query inside the OCI stage and run the DS job, it takes more than 3 hrs to run !!!

Why does this happen ?
And more importantly, WHAT can I do to improve this performance ?
3 hrs for this query is unacceptable. If this remains the trend then the client will have to wait for a month to get his data

chulett · Post by **chulett** » Fri Nov 12, 2010 8:08 am

Post your job design. Have you done any tests to determine the bottleneck?

Comparing Toad to DataStage is an 'apples and oranges' discussion.

Mike · Post by **Mike** » Fri Nov 12, 2010 8:26 am

Also, consider that TOAD returned one screen full of records in 15 minutes. I doubt that you scrolled through all 17 million records.

Mike

arpitchopra · Post by **arpitchopra** » Fri Nov 12, 2010 8:27 am

Job design !
I told you already:

Oracle OCI ---> Transformer -----> Oracle OCI

The only task that it performs is load data into the target table using the query in the source.

arpitchopra · Post by **arpitchopra** » Fri Nov 12, 2010 8:28 am

Oh, to clear that doubt. I ran that query with a "create table" on top of it so that I'm not viewing 500 records.

Mike · Post by **Mike** » Fri Nov 12, 2010 8:34 am

If that's all you're doing, you're probably better off leaving the data on the database server. Why drag 17 million rows of data across the network just to push them back across the network again?

If your transformer stage is doing transformation work that can't be accomplished on the database server, then traversing the network is your cost of doing those transformations.

Mike

arpitchopra · Post by **arpitchopra** » Fri Nov 12, 2010 8:57 am

Mike,
That is exactly what I want to do. But through DataStage. This is one of the steps in the entire ETL design being developed using DS. How can I achieve the thing that you mentioned (running the query on the DB itself instead of dragging data) using a DS server job ???

chulett · Post by **chulett** » Fri Nov 12, 2010 9:53 am

arpitchopra wrote:Job design !
I told you already:

Oracle OCI ---> Transformer -----> Oracle OCI

And where exactly did you already post this information?

There are many sides to this equation and how long the select allegedly takes is only a part of it. Add a constraint to your transformer and set it to @FALSE. That will shut off the load aspect and we'll see how long just the select part takes.

While we're at it, what settings are you using in both stages? In particular, the Array Size on both ends plus the Transaction Size and Update Action of the target. All of those will affect "performance".

arpitchopra · Post by **arpitchopra** » Mon Nov 15, 2010 7:45 am

Source
Array Size: 10000
Prefetch Mem Setting: 1024

Target
Array size: 1
Transaction size: 0
Rows per transaction: 0

3 months into DS and I'm looking at these numbers for the 1st time. Pardon my naivety

But again, I'm not performing any task in the transformer. Isn't there a feature in DS which is similar to "Pushdown Automation" in Informatica ?
Using which I can actually perform all these actions on the DB itself without transferring 17 mil rows over the n/w...

chulett · Post by **chulett** » Mon Nov 15, 2010 7:56 am

Your target array size is killing you. Did you try the @FALSE suggestion? I'd still be curious what your 'Update Action' is but in the meantime bump up your target array size to something larger so you are not sending one record at a time across the network to the database. Perhaps 1000 to start with.

arpitchopra wrote:But again, I'm not performing any task in the transformer. Isn't there a feature in DS which is similar to "Pushdown Automation" in Informatica ? Using which I can actually perform all these actions on the DB itself without transferring 17 mil rows over the n/w...

Not in the Server product and not in your version. I've heard PX 8.5 may have that option, not sure if that is the case or not.

HariK · Post by **HariK** » Mon Nov 15, 2010 7:59 am

You can replace the transformer by IPC stage if there are no derivations involved Or set the inter process property on Job properties.

"Pushdown Automation" --> Work around would be to create a job which reads some constant from dual table and load to a file. In the before or after SQL of source stage use the query you used on toad. But you would never know how many records are proccessed.

chulett · Post by **chulett** » Mon Nov 15, 2010 8:04 am

IPC is no magic bullet and you'll still end up with a transformer in the job. Best to learn what those OCI settings mean and control them properly.

http://it.toolbox.com/blogs/infosphere/ ... e-85-42312

Reason #10 is the 'Balanced Optimizer', the equivalent of your 'pushdown automation'.

arpitchopra · Post by **arpitchopra** » Wed Nov 24, 2010 1:28 am

Thanks a lot guys. I increased the target and source array size and memory prefetch values and that did the trick. The job runs in 30 mins or so now.

Thanks again for your help.

arpitchopra · Post by **arpitchopra** » Thu Dec 02, 2010 3:58 am

Hey guys,

First of all, apologies for re-opening a resolved thread. But I had to do it (or reference this thread in the new post) since I'm getting a very similar (actually the same) problem in another one of my jobs.

Here is the job design:

Oracle OCI ------> Transformer --------> Oracle OCI

The source has a custom SQL which returns the complete result set in 18 mins (I checked this by adding the @FALSE constraint in the transformer). The source stats are given below:

Array Size: 20,000 (What is the upper limit on this size ?)
Prefetch memory setting: 10,000

The transformer provides value to a column using a parameter's value.

The target performs a truncate/insert load on another table. The target stats are given below:

Array Size: 20,000
Transaction Size: 10,000
Rows per transaction: 10,000

When I run this entire job, it takes 55 mins (triple the amount taken to select the data from the source).
Why such strange behavior ?? Why, DS why... Why do you do this to me !?

ray.wurlod · Post by **ray.wurlod** » Thu Dec 02, 2010 4:55 am

Because you - YOU - are setting your array sizes ridiculously high. The "sweet spot" is usually low thousands, even just hundreds, depending on your row sizes.