Any way to get the count of particular column

gowrishankar_h · Post by **gowrishankar_h** » Sat Sep 07, 2013 12:47 pm

Hi,

I have requirement like to read a different dataset and filter a different column with hard coded value and write the count of those column in a separate dataset.

example:

I have 2 dataset,
1) dataset1 (col1,col2,col3,col4)
2) dataset2 (col21,col22,col33,col44)

i need to read those 2 dataset and filter col1 with some hardcode value 'A' col2 to 'B' etc.. and write get the count of individual column and write in a dataset 3 as below.

dataset3(count_col1,count_col2,count_col3 etc)

Note: i cant go for aggregate stage since i have to read many dataset and many col to count it will affect performance.so first i filtered the every individual colum and write in separate dataset.I used external source stage to count the no of record in the dataset but by this meathod i have to use many dataset to write the indiviual column in a separate dataset. its there any other way to reduce the no of dataset and without affecting performance.

Thanks in advance

ray.wurlod · Post by **ray.wurlod** » Sat Sep 07, 2013 3:37 pm

You CAN use Aggregator stage. If you precede it by a Sort stage, the Aggregator stage will be fast.

You could perform the counts in a Transformer stage, but that would involve setting up stage variables for each which you may regard as tedious.