group by issue
Moderators: chulett, rschirm, roy
group by issue
Hi
I have to do a group by on the key column in DS job, tried to use a sort stage but realized sort stage in parallel does not have the group by option.
tried to use an aggregator, though the aggregator has a group by option,
i cannot pass the rest of the column values(other than the key column) as it is into the target sequential file without specifying some kind of calculation. how do i proceed?
Thanks,
Svengiyil
I have to do a group by on the key column in DS job, tried to use a sort stage but realized sort stage in parallel does not have the group by option.
tried to use an aggregator, though the aggregator has a group by option,
i cannot pass the rest of the column values(other than the key column) as it is into the target sequential file without specifying some kind of calculation. how do i proceed?
Thanks,
Svengiyil
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
So, what is it that you want to do, sort or group? You can't "group" records and keep all of the detail, the end result will be fewer record so some sort of "calculation" (sum,max,last,etc) needs to be done on the non-grouped fields.
Seems like you may just want to sort your output data.
Seems like you may just want to sort your output data.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
ok let me explain...i have an input file in which characters from 1 to 10 are the key columns, i'm splitting this file into two parts , characters 1 to 10 as col1 and the rest as col2, now i need to group the output using col1 and store it into a target file(target file contains both col1 and col2), How do i achieve this?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Hi,
If what i've understood is correct,then
i/p file---->Transformer(T)--------------->Join Stage(J)------->Target
(i/p Link of (A) from transfomer(T)) ( o/p link of (A) connect to join stage(J))
-------------------------------> Aggregator(A)---------->
1.From trasfomer to join stage, pass all the columns
2.From Transfomer to the aggregator ,the columns you want to do a 'group by' operation.
3.In aggregator do the 'group by' operation.
4.In the join stage do the join using col1 as key and get the grouby value and pass it to the target.
So you would be having col1, col2 and the coulmn you have done 'group by' in target.
Regards,
Raja R P
If what i've understood is correct,then
i/p file---->Transformer(T)--------------->Join Stage(J)------->Target
(i/p Link of (A) from transfomer(T)) ( o/p link of (A) connect to join stage(J))
-------------------------------> Aggregator(A)---------->
1.From trasfomer to join stage, pass all the columns
2.From Transfomer to the aggregator ,the columns you want to do a 'group by' operation.
3.In aggregator do the 'group by' operation.
4.In the join stage do the join using col1 as key and get the grouby value and pass it to the target.
So you would be having col1, col2 and the coulmn you have done 'group by' in target.
Regards,
Raja R P
Amazed that came to a resolution... or even if there was an issue...
By doing Xfm->Join (non-group cols) and Xfm->Agg->Join (group cols) then your non-group columns will just explode the group back to the starting number of rows... so what is achieved?? Your non-group cols would need to go Xfm->Dedup->Join
Alternatively, use the Min/Max option of the aggregator and select the preserve source column type... (or something like that).
By doing Xfm->Join (non-group cols) and Xfm->Agg->Join (group cols) then your non-group columns will just explode the group back to the starting number of rows... so what is achieved?? Your non-group cols would need to go Xfm->Dedup->Join
Alternatively, use the Min/Max option of the aggregator and select the preserve source column type... (or something like that).