Page 1 of 1

grouping

Posted: Mon Jun 19, 2006 11:33 pm
by dspxlearn
Hi all,

I have a requirement where, i have some groups coming up from the source and i have to capture the number of groups and pass the count further. I believe we cannot do this in aggregator because it will count the no of records for each group.
Ex:

Col1:- 10,10,10,10
col1:- 20,20
Col1:- 30,30,30

I want the value as 3 as there are 3 groups.But when we use aggregator by using 'group by column' as col1 and using 'type of calucation as = row count' i will get 4,2,3.

So, how do i achieve this.

Posted: Tue Jun 20, 2006 12:21 am
by ArndW
The sample data you have listed is somewhat unclear - you have 3 rows with differing number of columns. A "group" in this context is normally a set of records where one column contains the same value. Is it "Col1:-" in this case? Or have you displayed only values for "Col1"?
Could you explain just a bit in detail what your data values are and what your "groups" are; I think you are not trying to do something overly complex and it can be done in an aggregator stage or even with a transform stage.

Posted: Tue Jun 20, 2006 1:02 am
by ray.wurlod
Two Aggregator stages - one to form the groups and the next to count them?

Posted: Tue Jun 20, 2006 1:33 am
by kumar_s
Or with the same Aggregator stage, with additional Dummy column with a value '1' and sum it up.
Or with the sort stage with couple of columns defined as Key and use KeyValueChange option to differentiate a group.

Posted: Tue Jun 20, 2006 2:51 am
by dspxlearn
Hi thanks for your replies..

ArndW,
My data is like this..

Col1
10
10
10
20
20
30
30
30
30

I should get the total no. of groups for this column. I mean in this case i should get 3.

ray.wurlod,
Do you mean to say that, i should do groupwise counting in first one and total groups counting in the second aggregator?? I this case in the second aggregator if i use the 'grouping column' as col1 and use 'type of calucation' as 'row count', the job is aborting as the single aggregator will not do grouping and calucation on the same column.

kumar_s,
i did the same thing but it needed 2 additional transformers which i want to reduce. So, i looking for other way around. I will try your second suggestion using sort stage...

Posted: Tue Jun 20, 2006 3:01 am
by ArndW
If that is all you need then a simple transform stage and not an aggregator is sufficient for you.
Create a stage variable "GroupCount" with a derivation of "IF In.Col1=LastGroup THEN GroupCount += 1 ELSE GroupCount" and an initial value of 0
Creat another stage variable after that called "LastGroup" with a derivation of "In.Col1".

Output the value of GroupCount with each row.

Posted: Tue Jun 20, 2006 3:11 am
by mpouet
Hi,

I think I would sort and "Remove Duplicates" (partitioned) to keep the distinct group values, and then count (sequential) the number of groups with @inrownum or an aggregator...

Matthieu.

Posted: Tue Jun 20, 2006 3:25 am
by ArndW
I just noticed that I posted a SERVER response in the EE forum. Ignore my previous post!

Matthieu's response is a good approach in a PX job.

Posted: Tue Jun 20, 2006 4:44 am
by kumar_s
You can accomplish with single sort stage with keyValueChange option. Assigning the col1 as key. (With the same as hash partitioned)

Posted: Tue Jun 20, 2006 7:07 am
by mpouet
Hi,

If your are working with a "fixed width column" flat file you can also use this unix command line :
cut -cx-y file | sort -u | wc -l > tempfile

x : starting character
y : ending character
file : your file's name (better with the path)
tempfile : file to store the number of groups (better with the path)

This is not a Datastage solution, but it works well. It can be a good solution if you don't have any transformation.

Matthieu.

Posted: Tue Jun 20, 2006 7:17 am
by kumar_s
Delimeted file shouldnt stop you from this aproach.
cut -d 'x' n file | sort -u | wc -l > tempfile.
Where x is the delimeter used. And n is the required nth column.

Posted: Tue Jun 20, 2006 9:01 am
by mpouet
Right !

Matthieu