Help understand wierd behaviour of Aggregator.

highpoint · Post by **highpoint** » Wed Sep 05, 2012 7:10 pm

I had problem with aggregator stages which were giving almost all column values as zero except for few records. I was able to resolve the issue by changing partitioning but still i don't understand what caused the issue in first place.

Job Design for section of Interest.

Code: Select all

			Remove Duplicate	Aggregator
			Remove Duplicate	Aggregator
			Remove Duplicate	Aggregator
			Remove Duplicate	Aggregator
   Transformer   							Join Stage	DataSet
			Remove Duplicate	Aggregator
			Remove Duplicate	Aggregator
			Remove Duplicate	Aggregator
			Remove Duplicate	Aggregator

I was earlier doing following on the input link of transformer.

Code: Select all

ProductID	Partition,Sort
OrderDt		Partition,Sort 
CompanyName	Sort
Person1Name	Sort
Person2Name	Sort
OrderID		Sort

After transformer all across I am using "SAME" partitioning.

Each output link of transformer has a different constraint for the record to move out.

All Remove Duplicate stages were removing duplicates on the above 6 fields.

All aggregate Stages were grouping on following fields and doing record count.

Code: Select all

ProductID
OrderDt
CompanyName
Person1Name
Person2Name

The results from the output of aggregator were all coming as zero.

Then I changed the input link of transformer for the following:

Code: Select all

ProductID	Partition,Sort
OrderDt		Partition,Sort 
CompanyName	Partition,Sort
Person1Name	Partition,Sort
Person2Name	Partition,Sort
OrderID		Sort

After this data start looking correct from the output of Aggregators.

I am not understanding why the thing was not working with partition on 2 fields and later working when partition on 5 fields. To me it should have work in either cases.

Also please let me know if you see any other design/partitioning issues with the above job design.

soumya5891 · Post by **soumya5891** » Thu Sep 06, 2012 11:22 am

What is the aggregation method you have used?
I guess you have used the sort method. If that is the case then I think it is required to sort the input data to the aggregator on the basis of aggregation keys with the same order of the aggregation group.