Need Aggregation logic in transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dstest
Participant
Posts: 66
Joined: Sun Aug 19, 2007 10:52 pm

Need Aggregation logic in transformer

Post by dstest »

I have areuqirement like below.

I need to find the max value by grouping on two columns.If even if i have a negative value but i need to take absolute value but in the output i need to populate originaal value.



Input data:

100,200,-5
100,200,1
100,200,-4

Output :
100,200,-5

Thanks
stefanfrost1
Premium Member
Premium Member
Posts: 99
Joined: Mon Sep 03, 2007 7:49 am
Location: Stockholm, Sweden

Post by stefanfrost1 »

You need to ensure sorted data by your defined keys and ensure that your partitioning supports that sorted order. When all that is complete then you can create one or as many stage varibles as needed to kep track of your keys and stage varibles for your aggregates or max-values. Then just let datastage process your data only letting the last record of each group go through the transformer carrying the aggregate values. Set changedrec key at your sort to simplify this process and make sure it defines the last record in your group...

Simple example

IsSame = If LastKey = InStream.Key Then Yes Else No
LastKey = InStream.Key
Counter = If IsSame = Yes Then Counter + 1 Else 0
ValueCount = If IsSame = Yes Then ValueCount + InStream.Value Else 0
MaxValue =
If IsSame = Yes Then
If MaxValue < InStream.MaxValue Then InStream.MaxValue
Else MaxValue
Else -999

In Your Output Stream, Put Filter Option
If IsSame = Yes And LastRec = 1 Then 1 Else 0
-------------------------------------
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
stefanfrost1
Premium Member
Premium Member
Posts: 99
Joined: Mon Sep 03, 2007 7:49 am
Location: Stockholm, Sweden

Post by stefanfrost1 »

You need to ensure sorted data by your defined keys and ensure that your partitioning supports that sorted order. When all that is complete then you can create one or as many stage varibles as needed to kep track of your keys and stage varibles for your aggregates or max-values. Then just let datastage process your data only letting the last record of each group go through the transformer carrying the aggregate values. Set changedrec key at your sort to simplify this process and make sure it defines the last record in your group...

Simple example

IsSame = If LastKey = InStream.Key Then Yes Else No
LastKey = InStream.Key
Counter = If IsSame = Yes Then Counter + 1 Else 0
ValueCount = If IsSame = Yes Then ValueCount + InStream.Value Else 0
MaxValue =
If IsSame = Yes Then
If MaxValue < InStream.MaxValue Then InStream.MaxValue
Else MaxValue
Else -999

In Your Output Stream, Put Filter Option
If IsSame = Yes And LastRec = 1 Then 1 Else 0
-------------------------------------
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
stefanfrost1
Premium Member
Premium Member
Posts: 99
Joined: Mon Sep 03, 2007 7:49 am
Location: Stockholm, Sweden

Post by stefanfrost1 »

You need to ensure sorted data by your defined keys and ensure that your partitioning supports that sorted order. When all that is complete then you can create one or as many stage varibles as needed to kep track of your keys and stage varibles for your aggregates or max-values. Then just let datastage process your data only letting the last record of each group go through the transformer carrying the aggregate values. Set changedrec key at your sort to simplify this process and make sure it defines the last record in your group...

Simple example

IsSame = If LastKey = InStream.Key Then Yes Else No
LastKey = InStream.Key
Counter = If IsSame = Yes Then Counter + 1 Else 0
ValueCount = If IsSame = Yes Then ValueCount + InStream.Value Else 0
MaxValue =
If IsSame = Yes Then
If MaxValue < InStream.MaxValue Then InStream.MaxValue
Else MaxValue
Else -999

In Your Output Stream, Put Filter Option
If IsSame = Yes And LastRec = 1 Then 1 Else 0
-------------------------------------
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
stefanfrost1
Premium Member
Premium Member
Posts: 99
Joined: Mon Sep 03, 2007 7:49 am
Location: Stockholm, Sweden

Post by stefanfrost1 »

Sorry about the multiple posting :cry:
-------------------------------------
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Next time, know that you can always delete your own post as long as there are no replies to it. So when this happens to me, I start from the bottom and delete the extra ones. Too late now, though. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

A suggestion: Instead of writing all that code in the transformer, you can create a temp table, load your data with an extra column that has absolute values, do a group by and get your result.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply