using a group by in Oracle stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
siddesai
Participant
Posts: 26
Joined: Thu Apr 26, 2007 11:28 pm

using a group by in Oracle stage

Post by siddesai »

Hey Guys,
I 'm basically writing multiple group-by queries to get my data.

The code goes as mentioned below:

Code: Select all

select subquery.columnname
(select columns from table name
 group by coulums)
group by colums where count (subquery.column) > 2
This above is then joined to a seperate query which is written in a different Oracle stage.

Now, I tried to insert this query in Oracle stage but it doesn't like the way query is gruped.I 'm processing about 150 million rows so performance is the key criteria. If I write this query in any standard SQL client it works perfectly fine and gives me my results but it takes L O T of time to get results. Therefore, I thought about using DS considering its exceptional parallel processing capabilities.

So, my question is:
How do I process the above mentioned query in Oracle stage?

I found a solution on my own is to create a dummy table and dump the records from above query in a table and then use that table data to create my join which simpler and faster. So, I used:

Code: Select all


Oracle Stage  --> Aggregator      --> Aggregator       --> Oracle Stage
(to read table)      (first group by)       (second group by)   (dummy table)

So, do you Guru's think I 'm doing the right thing or is there a better way?? keeping both performance and quality in mind. Please note I 'm new to the DS stuff.

Thanks for your assistance.
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Re: using a group by in Oracle stage

Post by JoshGeorge »

If you get your data out and do required aggregation in datastage, finally storing in a dataset will be a much better strategy than to create and load to a dummy table every time. Definitely an improvement on performance and design. If you intend to use this 'dummy table' as staging table where in some other applications also use outside datastage then above suggestion is not applicable.
siddesai wrote: I found a solution on my own is to create a dummy table and dump the records from above query in a table and then use that table data to create my join which simpler and faster. So, I used:

Code: Select all


Oracle Stage  --> Aggregator      --> Aggregator       --> Oracle Stage
(to read table)      (first group by)       (second group by)   (dummy table)

Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
siddesai
Participant
Posts: 26
Joined: Thu Apr 26, 2007 11:28 pm

Re: using a group by in Oracle stage

Post by siddesai »

Thanks for your reply.

The table is already loaded but now what about that nested sub query? Any clues on executing that in "User defined query" in Oracle stage?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What kind of problem are you having with it? Any valid SQL can be run as 'user defined' in the stage. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
keshav0307
Premium Member
Premium Member
Posts: 783
Joined: Mon Jan 16, 2006 10:17 pm
Location: Sydney, Australia

Post by keshav0307 »

you can write a sql query (nested or any type). and probably you don't need the aggregator stage. if it is valid query for sqlplus and return correct value then it must run on the oracle stage.
siddesai
Participant
Posts: 26
Joined: Thu Apr 26, 2007 11:28 pm

Post by siddesai »

@ Craig, Keshav. Thanks for your reply

Well, that's what I have been doing..writing SQL queries in user defined area. But, it seems DS does have issues executing sub queries or say group by queries.

Is there a work around to write group by queries within any stage?

Or

If you have a look at the query that I had posted or the 2 aggregator stages that I used to do double group by...But, my joins just don't work..It doesn't yield any results...

I perform a sort in round robin on the tables coming into join but I don't get any data coming out of joins. It's just weird..
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

siddesai wrote:it seems DS does have issues executing sub queries or say group by queries.
No, it doesn't. There's nothing about DataStage that would make your joins 'not work' - that's between you, your query and your source tables. Get the query working in your favorite tool - Toad, sqlplus, whatever - and it will work 'in DataStage'.

You'd need to post more specifics about what "just doesn't work" means for anyone to be able to help.
-craig

"You can never have too many knives" -- Logan Nine Fingers
siddesai
Participant
Posts: 26
Joined: Thu Apr 26, 2007 11:28 pm

Post by siddesai »

Thanks for your reply Craig.

Basically, I have written a sub query as mentioned in my original post. It goes as follows:

Code: Select all

select SubQuery.columname from
(
 select colum1,column2,column3
from tablename
group by column1,column2,column3

) from subquery

Works great in SQL Developer. So, I guess should work in DS as well.

Do you this kind of query will work in user defined area?
siddesai
Participant
Posts: 26
Joined: Thu Apr 26, 2007 11:28 pm

Post by siddesai »

Thanks for your reply Craig.

Basically, I have written a sub query as mentioned in my original post. It goes as follows:

Code: Select all

select SubQuery.columname from
(
 select colum1,column2,column3
from tablename
group by column1,column2,column3

) from subquery
Works great in SQL Developer. So, I guess should work in DS as well.

Do you this kind of query will work in user defined area?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Right... we all saw your fake sql in the first post. :wink:

As noted, any valid SQL will work in DataStage. What precisely is the problem you have when you try to use this in a job?
-craig

"You can never have too many knives" -- Logan Nine Fingers
siddesai
Participant
Posts: 26
Joined: Thu Apr 26, 2007 11:28 pm

Post by siddesai »

chulett wrote:Right... we all saw your fake sql in the first post. :wink:

As noted, any valid SQL will work in DataStage. What precisely is the problem you have when you try to use this in a job?
I appreciate your response, Craig.

I will try run that job again and keep you posted about it.

Also, I had a chat with one of the instructors who said "If Oracle takes too much time processing the query then DS will fail"
I should have asked this question to him but now that it has popped up in my mind; I would like to ask,

--> With the above mentioned "fake" :lol: query...do you think his comments were right?? I don't think DS can buffer results coming back from Oracle?

If you feel I need to do some reading then please advice :wink:
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

siddesai wrote:Also, I had a chat with one of the instructors who said "If Oracle takes too much time processing the query then DS will fail"
Umm... let's play Jeopardy: "What is false?". :wink:

Let's not worry about that right now. When you get back to your job and try this again, let us know what issue(s) you are having. The more details the merrier.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply