Page 1 of 1

how to get the distinct count

Posted: Tue Sep 21, 2004 1:14 am
by lakshmipriya
how to count the distinct of a column value

if the column values are

2
3
4
2

then the count should be 3

how to acheive this in datastage other than DB2

Posted: Tue Sep 21, 2004 1:33 am
by ray.wurlod
Use an Aggregator stage. Group your column and count your column.
Refer to Chapter 17 of Parallel Job Developer's Guide (parjdev.pdf) for information on how to use this stage type. It works best if the input data are sorted.

Posted: Tue Sep 21, 2004 9:52 pm
by mandyli
Hi priya,

Yes you can achieve this using Aggregator stage only.

Posted: Tue Sep 21, 2004 10:27 pm
by lakshmipriya
I tired this with too different aggregators thats working fine, whether i can acheive this in a single aggregator.

I tired it with converting the input column to double and having the same coulmn in both group by and count column too. It worked but output i got is not what is expected for distinct count

wht can i do? :oops:

Posted: Wed Sep 22, 2004 12:16 am
by ray.wurlod
Give an example of what you are processing and what you expect to see. A single Aggregator will satisfy the request you originally posted.

Posted: Wed Sep 22, 2004 2:06 am
by lakshmipriya
I want to take a distinct of combination of two columns transaction_date(date) and transaction_sequence_id(integer)

Posted: Wed Sep 22, 2004 2:13 am
by ray.wurlod
In DataStage you need two Aggregator stages.

In most databases you need two separate queries.

Red Brick is the only database of which I am aware that allows multiple DISTINCT clauses in the one SELECT query. But, then, it's designed for data warehouse (business intelligence) queries.