I have a job with the following design.
From dataset I have one copy stage and from that copy stage one link goes to aggregator stage and another link goes to rdup stage.Then the output of aggregation of rdup is join.The aggregator keys ,rdup keys and join keys are same.And I have mentioned the partitioning properly.
The input of aggregatoe is hash partitioned and sorted on the basis of aggregation keys.Now when I am using Aggregator method as Hash the No of records output from the join stage is different from the join output when aggregator method is sort.
Strange behaviour of aggregator stage
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 152
- Joined: Mon Mar 07, 2011 6:16 am
Re: Strange behaviour of aggregator stage
If you sort the data then you should be using "sort" in the Aggregator. Hash is used if you didn't "sort" the data in adv.
soumya5891 wrote:I have a job with the following design.
From dataset I have one copy stage and from that copy stage one link goes to aggregator stage and another link goes to rdup stage.Then the output of aggregation of rdup is join.The aggregator keys ,rdup keys and join keys are same.And I have mentioned the partitioning properly.
The input of aggregatoe is hash partitioned and sorted on the basis of aggregation keys.Now when I am using Aggregator method as Hash the No of records output from the join stage is different from the join output when aggregator method is sort.
Regards
Felix
Felix
Re: Strange behaviour of aggregator stage
If you sort the data then you should be using "sort" in the Aggregator. Hash is used if you didn't "sort" the data in adv.
soumya5891 wrote:I have a job with the following design.
From dataset I have one copy stage and from that copy stage one link goes to aggregator stage and another link goes to rdup stage.Then the output of aggregation of rdup is join.The aggregator keys ,rdup keys and join keys are same.And I have mentioned the partitioning properly.
The input of aggregatoe is hash partitioned and sorted on the basis of aggregation keys.Now when I am using Aggregator method as Hash the No of records output from the join stage is different from the join output when aggregator method is sort.
Regards
Felix
Felix
Re: Strange behaviour of aggregator stage
If you sort the data then you should be using "sort" in the Aggregator. Hash is used if you didn't "sort" the data in adv.
soumya5891 wrote:I have a job with the following design.
From dataset I have one copy stage and from that copy stage one link goes to aggregator stage and another link goes to rdup stage.Then the output of aggregation of rdup is join.The aggregator keys ,rdup keys and join keys are same.And I have mentioned the partitioning properly.
The input of aggregatoe is hash partitioned and sorted on the basis of aggregation keys.Now when I am using Aggregator method as Hash the No of records output from the join stage is different from the join output when aggregator method is sort.
Regards
Felix
Felix
-
- Participant
- Posts: 597
- Joined: Fri Apr 29, 2005 6:19 am
- Location: Singapore