Page 1 of 1

Reverse the aggregation Process

Posted: Wed Dec 23, 2009 12:17 am
by Shamanth_vk
We have requirement like below, the records are in aggregated format and we need to reverse the aggregation process.

For Ex. Input
a,2000,2
b,1500,3

output
-- a,1000
-- a,1000
-- b,500
-- b,500
-- b,500

Please let me know how to acheive this in datastage. Thanks a lot in advance.

Posted: Wed Dec 23, 2009 1:10 am
by ray.wurlod
What is the largest number that will appear in column #3?
What should happen if column #2 is not precisely divisible by column #3?

Posted: Wed Dec 23, 2009 3:42 am
by Shamanth_vk
ray.wurlod wrote:What is the largest number that will appear in column #3?
What should happen if column #2 is not precisely divisible by column #3? ...

- Largest number that will appear would be 4.
- If column #2 is not precisely divisible by column #3. This scenario will not be occurring in our case.

Please let me know if more information is required.

Thanks & Regards,
Shamanth VK

Posted: Wed Dec 23, 2009 4:30 am
by Sainath.Srinivasan
With just max 4 splits, you will be able to code using 4 links from a transformer and funnel them together.

Posted: Wed Dec 23, 2009 4:10 pm
by ray.wurlod
That's what I would suggest too. Constrain the outputs based on the value in column #3.

Posted: Wed Dec 23, 2009 9:50 pm
by vmcburney
Create a text file with these rows in it:
1,1
2,2
2,2
3,3
3,3
3,3
4,4
4,4
4,4
4,4

In a lookup stage join to this file on the aggregated count column and switch on "Multiple Rows Returned from this Link" so the lookup will then create 1 to 4 output rows depending on the number used. A lookup stage can duplicate input rows if multiple lookup rows are found. Send the output to a Transformer where you define the final columns and divide the aggregated number by the aggregated count to come up with the new output amount.

There are two advantages - you only have to define your mappings and rules once and not four times in the transformer and you will find this Transformer easier to code and maintain. It's extendable without changing or recompiling the job - put 5, 6, 7s into that lookup file and it will automatically handle them if they ever come through.