Help understand wierd behaviour of Aggregator.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

Help understand wierd behaviour of Aggregator.

Post by highpoint »

I had problem with aggregator stages which were giving almost all column values as zero except for few records. I was able to resolve the issue by changing partitioning but still i don't understand what caused the issue in first place.


Job Design for section of Interest.

Code: Select all

			Remove Duplicate	Aggregator
			Remove Duplicate	Aggregator
			Remove Duplicate	Aggregator
			Remove Duplicate	Aggregator
   Transformer   							Join Stage	DataSet
			Remove Duplicate	Aggregator
			Remove Duplicate	Aggregator
			Remove Duplicate	Aggregator
			Remove Duplicate	Aggregator
I was earlier doing following on the input link of transformer.

Code: Select all

ProductID	Partition,Sort
OrderDt		Partition,Sort 
CompanyName	Sort
Person1Name	Sort
Person2Name	Sort
OrderID		Sort
After transformer all across I am using "SAME" partitioning.

Each output link of transformer has a different constraint for the record to move out.

All Remove Duplicate stages were removing duplicates on the above 6 fields.

All aggregate Stages were grouping on following fields and doing record count.

Code: Select all

ProductID
OrderDt
CompanyName
Person1Name
Person2Name
The results from the output of aggregator were all coming as zero.

Then I changed the input link of transformer for the following:

Code: Select all

ProductID	Partition,Sort
OrderDt		Partition,Sort 
CompanyName	Partition,Sort
Person1Name	Partition,Sort
Person2Name	Partition,Sort
OrderID		Sort
After this data start looking correct from the output of Aggregators.

I am not understanding why the thing was not working with partition on 2 fields and later working when partition on 5 fields. To me it should have work in either cases.

Also please let me know if you see any other design/partitioning issues with the above job design.
soumya5891
Participant
Posts: 152
Joined: Mon Mar 07, 2011 6:16 am

Post by soumya5891 »

What is the aggregation method you have used?
I guess you have used the sort method. If that is the case then I think it is required to sort the input data to the aggregator on the basis of aggregation keys with the same order of the aggregation group.
Soumya
Post Reply