How to find the most dominant valuex

rafidwh · Post by **rafidwh** » Sun Dec 13, 2009 9:54 pm

Hi All,

I have the data as shown below and I need to pull the most dominant value, pl give me the best logic applicable

ID DT
1 X
1 X
1 Y
1 Y
1 Y
1 Z

There are two fields primary and secondary and my data should look like

ID DT Primary Secondary

1 X Y X
1 Y Y X
1 Z Y X

Here Y is the most domianant and X is the second most dominant.

Thanks in advance.

jhmckeever · Post by **jhmckeever** » Sun Dec 13, 2009 11:37 pm

Assuming ...
1. By 'dominant' you mean 'frequent'
2. Your input data is sorted (as it is in your example)

Without designing the job in its entirety for you, here are a few suggestions ...

- Use an aggregator to generate a count of each DT value within each ID group

- Sort and propagate the first two rows (ie the two most frequent) on individual links - 'primary' and 'secondary' as apppropriate. You may want to use a key-change column and transformer stage variables to identify these.

- Rejoin primary and secondary columns to the input using the ID

That should hopefully be enough to get you started!

John.

rafidwh · Post by **rafidwh** » Mon Dec 14, 2009 12:51 am

Is there a way to check without using a trnasformer stage.

rafidwh · Post by **rafidwh** » Mon Dec 14, 2009 12:54 am

The first point ie the aggregator is bit confusing, how can i get count on DT within group ID

ray.wurlod · Post by **ray.wurlod** » Mon Dec 14, 2009 1:23 am

rafidwh wrote:Is there a way to check without using a trnasformer stage.

Probably. But why?

jhmckeever · Post by **jhmckeever** » Mon Dec 14, 2009 1:26 am

1. What's the objection/restriction around using Transformer stages?

2. Specify 'ID' and 'DT' as grouping keys and use an aggregation type of 'Count Rows'. This will count unique ID/DT combinations. Define a 'Count Output Column' and add a descending sort by this column.

rafidwh · Post by **rafidwh** » Mon Dec 14, 2009 9:53 am

Hi Ray,

I am bit confused on how to use aggregator if I want to use the third dominant also apart from first and second dominant value.

ray.wurlod · Post by **ray.wurlod** » Mon Dec 14, 2009 3:07 pm

Why are you asking me? John McKeever suggested it, and explained correctly how to use it.

Third level = third grouping key (imagine you're doing it with an SQL query).