Help understand wierd behaviour of Aggregator.
Posted: Wed Sep 05, 2012 7:10 pm
I had problem with aggregator stages which were giving almost all column values as zero except for few records. I was able to resolve the issue by changing partitioning but still i don't understand what caused the issue in first place.
Job Design for section of Interest.
I was earlier doing following on the input link of transformer.
After transformer all across I am using "SAME" partitioning.
Each output link of transformer has a different constraint for the record to move out.
All Remove Duplicate stages were removing duplicates on the above 6 fields.
All aggregate Stages were grouping on following fields and doing record count.
The results from the output of aggregator were all coming as zero.
Then I changed the input link of transformer for the following:
After this data start looking correct from the output of Aggregators.
I am not understanding why the thing was not working with partition on 2 fields and later working when partition on 5 fields. To me it should have work in either cases.
Also please let me know if you see any other design/partitioning issues with the above job design.
Job Design for section of Interest.
Code: Select all
Remove Duplicate Aggregator
Remove Duplicate Aggregator
Remove Duplicate Aggregator
Remove Duplicate Aggregator
Transformer Join Stage DataSet
Remove Duplicate Aggregator
Remove Duplicate Aggregator
Remove Duplicate Aggregator
Remove Duplicate Aggregator
Code: Select all
ProductID Partition,Sort
OrderDt Partition,Sort
CompanyName Sort
Person1Name Sort
Person2Name Sort
OrderID Sort
Each output link of transformer has a different constraint for the record to move out.
All Remove Duplicate stages were removing duplicates on the above 6 fields.
All aggregate Stages were grouping on following fields and doing record count.
Code: Select all
ProductID
OrderDt
CompanyName
Person1Name
Person2Name
Then I changed the input link of transformer for the following:
Code: Select all
ProductID Partition,Sort
OrderDt Partition,Sort
CompanyName Partition,Sort
Person1Name Partition,Sort
Person2Name Partition,Sort
OrderID Sort
I am not understanding why the thing was not working with partition on 2 fields and later working when partition on 5 fields. To me it should have work in either cases.
Also please let me know if you see any other design/partitioning issues with the above job design.