Page 1 of 1

How to find the most dominant valuex

Posted: Sun Dec 13, 2009 9:54 pm
by rafidwh
Hi All,

I have the data as shown below and I need to pull the most dominant value, pl give me the best logic applicable

ID DT
1 X
1 X
1 Y
1 Y
1 Y
1 Z

There are two fields primary and secondary and my data should look like

ID DT Primary Secondary

1 X Y X
1 Y Y X
1 Z Y X

Here Y is the most domianant and X is the second most dominant.

Thanks in advance.

Posted: Sun Dec 13, 2009 11:37 pm
by jhmckeever
Assuming ...
1. By 'dominant' you mean 'frequent'
2. Your input data is sorted (as it is in your example)

Without designing the job in its entirety for you, here are a few suggestions ...

- Use an aggregator to generate a count of each DT value within each ID group

- Sort and propagate the first two rows (ie the two most frequent) on individual links - 'primary' and 'secondary' as apppropriate. You may want to use a key-change column and transformer stage variables to identify these.

- Rejoin primary and secondary columns to the input using the ID

That should hopefully be enough to get you started!

John.

Posted: Mon Dec 14, 2009 12:51 am
by rafidwh
Is there a way to check without using a trnasformer stage.

Posted: Mon Dec 14, 2009 12:54 am
by rafidwh
The first point ie the aggregator is bit confusing, how can i get count on DT within group ID

Posted: Mon Dec 14, 2009 1:23 am
by ray.wurlod
rafidwh wrote:Is there a way to check without using a trnasformer stage.
Probably. But why?

Posted: Mon Dec 14, 2009 1:26 am
by jhmckeever
1. What's the objection/restriction around using Transformer stages?

2. Specify 'ID' and 'DT' as grouping keys and use an aggregation type of 'Count Rows'. This will count unique ID/DT combinations. Define a 'Count Output Column' and add a descending sort by this column.

Posted: Mon Dec 14, 2009 9:53 am
by rafidwh
Hi Ray,

I am bit confused on how to use aggregator if I want to use the third dominant also apart from first and second dominant value.

Posted: Mon Dec 14, 2009 3:07 pm
by ray.wurlod
Why are you asking me? John McKeever suggested it, and explained correctly how to use it.

Third level = third grouping key (imagine you're doing it with an SQL query).