How can I apply this logic?

bond88 · Post by **bond88** » Thu Dec 06, 2012 2:16 pm

Hi,
I have a input like this

ID Pin1 Pin2
1 123 0
1 855 1
1 855 2
2 52 0
2 120 1

I need output like this

ID Pin1 Pin2
1 855 2
2 120 1

Conditions:

Unique ID
Pin 1 maximum
if you have 2 records with maximum Pin 1 then take maximum Pin 2

Any suggestions?

Thanks,

chulett · Post by **chulett** » Thu Dec 06, 2012 2:25 pm

Aggregator. Group by the first column, take max() on both of the other two columns.

ray.wurlod · Post by **ray.wurlod** » Thu Dec 06, 2012 2:31 pm

Partition by ID. Sort by ID and Pin. Use Remove Duplicates to preserve last record in each group, where groups are defined by ID.

bond88 · Post by **bond88** » Thu Dec 06, 2012 2:52 pm

chulett wrote:Aggregator. Group by the first column, take max() on both of the other two columns. ...

Thanks Chulett,
I am able to group by ID in aggregator state how can I select max() values of pin1 and pin2 ? Please suggest me.

Thanks,

bond88 · Post by **bond88** » Thu Dec 06, 2012 4:22 pm

ray.wurlod wrote:Partition by ID. Sort by ID and Pin. Use Remove Duplicates to preserve last record in each group, where groups are defined by ID. ...

Hi Ray,
I can sort id and pin's by using sort stage. How could I get done the partition on ID? Please suggest me.

Thanks,

chulett · Post by **chulett** » Thu Dec 06, 2012 5:19 pm

I don't have any documentation in front of me at the moment but it should be pretty straight-forward to work out what you need there. Basically, for every field you are not grouping by you have to tell the Aggregator what aggregation method you want applied. One of the available methods is Max or Maximum or something very similar. Choose that as the derivation of the output columns.

Checked the docs: under Aggregations you would use Maximum Value.

ray.wurlod · Post by **ray.wurlod** » Thu Dec 06, 2012 7:28 pm

Specify Modulus or Hash (with ID as the selected key) as the partitioning algorithm, on the Partitioning tab of the Input link properties in the Sort stage.

bond88 · Post by **bond88** » Fri Dec 07, 2012 10:44 am

Thanks Ray,

One question, I didn't see any difference by selecting key as a partitioning algorithm. Even it returns same rows before also. I used sort stage to sort on key, id and id2 and after that I used remove duplicates. Whats the purpose of using partitioning (hash or modulus) on key.

Thank you,

ray.wurlod · Post by **ray.wurlod** » Fri Dec 07, 2012 5:45 pm

"Key Adjacency" to make sure that any one Key value only occurs on one partition: that is, that each key value only generates a single group. You don't do this and, worst case, you'll get one group per key per node.

harishkumar.upadrasta · Wed Dec 26, 2012 10:52 am

This can be achieved using stage variable's too..sort the data using your 2 fields svF1 is initially set to blank. If svF1=field 1 value then set svF1 to field1. So if the value changes then pass the value of firld2 to output....

ray.wurlod · Post by **ray.wurlod** » Wed Dec 26, 2012 4:07 pm

... only if the fields you're comparing in the stage variables have the same value on the same partition. Hence the need for correct partitioning.

harishkumar.upadrasta · Wed Dec 26, 2012 10:10 pm

Yeah Ray, my bad missed that point. You need to hash partition the data so that similar key values falls in same Partition and above logic works.