Hi All,
I have the data as shown below and I need to pull the most dominant value, pl give me the best logic applicable
ID DT
1 X
1 X
1 Y
1 Y
1 Y
1 Z
There are two fields primary and secondary and my data should look like
ID DT Primary Secondary
1 X Y X
1 Y Y X
1 Z Y X
Here Y is the most domianant and X is the second most dominant.
Thanks in advance.
How to find the most dominant valuex
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
Assuming ...
1. By 'dominant' you mean 'frequent'
2. Your input data is sorted (as it is in your example)
Without designing the job in its entirety for you, here are a few suggestions ...
- Use an aggregator to generate a count of each DT value within each ID group
- Sort and propagate the first two rows (ie the two most frequent) on individual links - 'primary' and 'secondary' as apppropriate. You may want to use a key-change column and transformer stage variables to identify these.
- Rejoin primary and secondary columns to the input using the ID
That should hopefully be enough to get you started!
John.
1. By 'dominant' you mean 'frequent'
2. Your input data is sorted (as it is in your example)
Without designing the job in its entirety for you, here are a few suggestions ...
- Use an aggregator to generate a count of each DT value within each ID group
- Sort and propagate the first two rows (ie the two most frequent) on individual links - 'primary' and 'secondary' as apppropriate. You may want to use a key-change column and transformer stage variables to identify these.
- Rejoin primary and secondary columns to the input using the ID
That should hopefully be enough to get you started!
John.
Last edited by jhmckeever on Mon Dec 14, 2009 1:28 am, edited 1 time in total.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 301
- Joined: Thu Jul 14, 2005 10:27 am
- Location: Melbourne, Australia
- Contact:
1. What's the objection/restriction around using Transformer stages?
2. Specify 'ID' and 'DT' as grouping keys and use an aggregation type of 'Count Rows'. This will count unique ID/DT combinations. Define a 'Count Output Column' and add a descending sort by this column.
2. Specify 'ID' and 'DT' as grouping keys and use an aggregation type of 'Count Rows'. This will count unique ID/DT combinations. Define a 'Count Output Column' and add a descending sort by this column.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Why are you asking me? John McKeever suggested it, and explained correctly how to use it.
Third level = third grouping key (imagine you're doing it with an SQL query).
Third level = third grouping key (imagine you're doing it with an SQL query).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.