I am facing a very strange problem and would be extremely grateful if any of you could help me with the solution.
I have a query which performs actions like pivoting up, aggregating (and then merging with the same data set) and finally a join. The 2 tables involved contain 30 million (main table) and 2.5 million (join) records respectively. This query takes close to 15 minutes to run on TOAD and returns 17 million records.
However, when I paste the same query inside the OCI stage and run the DS job, it takes more than 3 hrs to run !!!
Why does this happen ?
And more importantly, WHAT can I do to improve this performance ?
3 hrs for this query is unacceptable. If this remains the trend then the client will have to wait for a month to get his data
If that's all you're doing, you're probably better off leaving the data on the database server. Why drag 17 million rows of data across the network just to push them back across the network again?
If your transformer stage is doing transformation work that can't be accomplished on the database server, then traversing the network is your cost of doing those transformations.
Mike,
That is exactly what I want to do. But through DataStage. This is one of the steps in the entire ETL design being developed using DS. How can I achieve the thing that you mentioned (running the query on the DB itself instead of dragging data) using a DS server job ???
arpitchopra wrote:Job design !
I told you already:
Oracle OCI ---> Transformer -----> Oracle OCI
And where exactly did you already post this information?
There are many sides to this equation and how long the select allegedly takes is only a part of it. Add a constraint to your transformer and set it to @FALSE. That will shut off the load aspect and we'll see how long just the select part takes.
While we're at it, what settings are you using in both stages? In particular, the Array Size on both ends plus the Transaction Size and Update Action of the target. All of those will affect "performance".
-craig
"You can never have too many knives" -- Logan Nine Fingers
3 months into DS and I'm looking at these numbers for the 1st time. Pardon my naivety
But again, I'm not performing any task in the transformer. Isn't there a feature in DS which is similar to "Pushdown Automation" in Informatica ?
Using which I can actually perform all these actions on the DB itself without transferring 17 mil rows over the n/w...
Your target array size is killing you. Did you try the @FALSE suggestion? I'd still be curious what your 'Update Action' is but in the meantime bump up your target array size to something larger so you are not sending one record at a time across the network to the database. Perhaps 1000 to start with.
arpitchopra wrote:But again, I'm not performing any task in the transformer. Isn't there a feature in DS which is similar to "Pushdown Automation" in Informatica ? Using which I can actually perform all these actions on the DB itself without transferring 17 mil rows over the n/w...
Not in the Server product and not in your version. I've heard PX 8.5 may have that option, not sure if that is the case or not.
-craig
"You can never have too many knives" -- Logan Nine Fingers
You can replace the transformer by IPC stage if there are no derivations involved Or set the inter process property on Job properties.
"Pushdown Automation" --> Work around would be to create a job which reads some constant from dual table and load to a file. In the before or after SQL of source stage use the query you used on toad. But you would never know how many records are proccessed.
First of all, apologies for re-opening a resolved thread. But I had to do it (or reference this thread in the new post) since I'm getting a very similar (actually the same) problem in another one of my jobs.
The source has a custom SQL which returns the complete result set in 18 mins (I checked this by adding the @FALSE constraint in the transformer). The source stats are given below:
Array Size: 20,000 (What is the upper limit on this size ?)
Prefetch memory setting: 10,000
The transformer provides value to a column using a parameter's value.
The target performs a truncate/insert load on another table. The target stats are given below:
Array Size: 20,000
Transaction Size: 10,000
Rows per transaction: 10,000
When I run this entire job, it takes 55 mins (triple the amount taken to select the data from the source).
Why such strange behavior ?? Why, DS why... Why do you do this to me !?
Because you - YOU - are setting your array sizes ridiculously high. The "sweet spot" is usually low thousands, even just hundreds, depending on your row sizes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.