How can I apply this logic?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

How can I apply this logic?

Post by bond88 »

Hi,
I have a input like this

ID Pin1 Pin2
1 123 0
1 855 1
1 855 2
2 52 0
2 120 1

I need output like this

ID Pin1 Pin2
1 855 2
2 120 1

Conditions:

Unique ID
Pin 1 maximum
if you have 2 records with maximum Pin 1 then take maximum Pin 2

Any suggestions?

Thanks,
Bhanu
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Aggregator. Group by the first column, take max() on both of the other two columns.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Partition by ID. Sort by ID and Pin. Use Remove Duplicates to preserve last record in each group, where groups are defined by ID.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Post by bond88 »

chulett wrote:Aggregator. Group by the first column, take max() on both of the other two columns. ...
Thanks Chulett,
I am able to group by ID in aggregator state how can I select max() values of pin1 and pin2 ? Please suggest me.

Thanks,
Bhanu
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Post by bond88 »

ray.wurlod wrote:Partition by ID. Sort by ID and Pin. Use Remove Duplicates to preserve last record in each group, where groups are defined by ID. ...
Hi Ray,
I can sort id and pin's by using sort stage. How could I get done the partition on ID? Please suggest me.

Thanks,
Bhanu
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I don't have any documentation in front of me at the moment but it should be pretty straight-forward to work out what you need there. Basically, for every field you are not grouping by you have to tell the Aggregator what aggregation method you want applied. One of the available methods is Max or Maximum or something very similar. Choose that as the derivation of the output columns.

Checked the docs: under Aggregations you would use Maximum Value.
Last edited by chulett on Thu Dec 06, 2012 11:25 pm, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Specify Modulus or Hash (with ID as the selected key) as the partitioning algorithm, on the Partitioning tab of the Input link properties in the Sort stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bond88
Participant
Posts: 109
Joined: Mon Oct 15, 2012 10:05 am
Location: USA

Post by bond88 »

Thanks Ray,

One question, I didn't see any difference by selecting key as a partitioning algorithm. Even it returns same rows before also. I used sort stage to sort on key, id and id2 and after that I used remove duplicates. Whats the purpose of using partitioning (hash or modulus) on key.

Thank you,
Bhanu
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

"Key Adjacency" to make sure that any one Key value only occurs on one partition: that is, that each key value only generates a single group. You don't do this and, worst case, you'll get one group per key per node.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
harishkumar.upadrasta
Participant
Posts: 18
Joined: Tue Dec 25, 2012 10:39 pm
Location: Detroit,MI

Post by harishkumar.upadrasta »

This can be achieved using stage variable's too..sort the data using your 2 fields svF1 is initially set to blank. If svF1=field 1 value then set svF1 to field1. So if the value changes then pass the value of firld2 to output....
Harish
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... only if the fields you're comparing in the stage variables have the same value on the same partition. Hence the need for correct partitioning.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
harishkumar.upadrasta
Participant
Posts: 18
Joined: Tue Dec 25, 2012 10:39 pm
Location: Detroit,MI

Post by harishkumar.upadrasta »

Yeah Ray, my bad missed that point. You need to hash partition the data so that similar key values falls in same Partition and above logic works.
Harish
Post Reply