group by issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Svengiyil
Participant
Posts: 25
Joined: Tue May 05, 2009 12:16 am

group by issue

Post by Svengiyil »

Hi

I have to do a group by on the key column in DS job, tried to use a sort stage but realized sort stage in parallel does not have the group by option.
tried to use an aggregator, though the aggregator has a group by option,
i cannot pass the rest of the column values(other than the key column) as it is into the target sequential file without specifying some kind of calculation. how do i proceed?


Thanks,
Svengiyil
dxk9
Participant
Posts: 105
Joined: Wed Aug 19, 2009 12:46 am
Location: Chennai, Tamil Nadu

Post by dxk9 »

Can you explain the requirement clearly. Its kind of confusing to me. :?

Regards,
Divya
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

You can split them into two links - with one link to agg and returning back via a join stage.

Simplest will be to use a transformer to do the calculation.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So, what is it that you want to do, sort or group? You can't "group" records and keep all of the detail, the end result will be fewer record so some sort of "calculation" (sum,max,last,etc) needs to be done on the non-grouped fields. :?

Seems like you may just want to sort your output data.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Svengiyil
Participant
Posts: 25
Joined: Tue May 05, 2009 12:16 am

Post by Svengiyil »

ok let me explain...i have an input file in which characters from 1 to 10 are the key columns, i'm splitting this file into two parts , characters 1 to 10 as col1 and the rest as col2, now i need to group the output using col1 and store it into a target file(target file contains both col1 and col2), How do i achieve this?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hmmm... still clear as mud. Can you provide examples of what it looks like before, then split and then the desired output? Hopefully that will help light the :idea:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The "fork join" model suggested by Sainath seems to me to be the way to go.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
RAJARP
Participant
Posts: 147
Joined: Thu Dec 06, 2007 6:46 am
Location: Chennai

Post by RAJARP »

Hi,
If what i've understood is correct,then

i/p file---->Transformer(T)--------------->Join Stage(J)------->Target

(i/p Link of (A) from transfomer(T)) ( o/p link of (A) connect to join stage(J))
-------------------------------> Aggregator(A)---------->


1.From trasfomer to join stage, pass all the columns
2.From Transfomer to the aggregator ,the columns you want to do a 'group by' operation.
3.In aggregator do the 'group by' operation.
4.In the join stage do the join using col1 as key and get the grouby value and pass it to the target.

So you would be having col1, col2 and the coulmn you have done 'group by' in target.

Regards,
Raja R P
dxk9
Participant
Posts: 105
Joined: Wed Aug 19, 2009 12:46 am
Location: Chennai, Tamil Nadu

Post by dxk9 »

Svengiyil,

As per your requirement, you have 2 columns, want to group by Col1 and also want both the Cols in the output... which is similar to sorting by Col1.

Plz let me know if my understanding of the requirement is wrong.

Regards,
Divya
Svengiyil
Participant
Posts: 25
Joined: Tue May 05, 2009 12:16 am

Post by Svengiyil »

Hi Raja, i tried ur method and it works fine . Thanks a lot.
Svengiyil
Participant
Posts: 25
Joined: Tue May 05, 2009 12:16 am

Post by Svengiyil »

Hi Divya,

Ur understanding is correct and i tried the method suggested by Raja, it works.
dxk9
Participant
Posts: 105
Joined: Wed Aug 19, 2009 12:46 am
Location: Chennai, Tamil Nadu

Post by dxk9 »

Svengiyil,

Good to hear that you got a good solution!!! :)

But jus curious to know whether you get the same output when you sort it by the key column ?? As per my understanding, you should.

Regards,
Divya
Svengiyil
Participant
Posts: 25
Joined: Tue May 05, 2009 12:16 am

Post by Svengiyil »

I may have to group more than one column in future, in which case only sort would not serve the purpose.
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Amazed that came to a resolution... or even if there was an issue...

By doing Xfm->Join (non-group cols) and Xfm->Agg->Join (group cols) then your non-group columns will just explode the group back to the starting number of rows... so what is achieved?? Your non-group cols would need to go Xfm->Dedup->Join

Alternatively, use the Min/Max option of the aggregator and select the preserve source column type... (or something like that).
Post Reply