Strange behaviour of aggregator stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
soumya5891
Participant
Posts: 152
Joined: Mon Mar 07, 2011 6:16 am

Strange behaviour of aggregator stage

Post by soumya5891 »

I have a job with the following design.

From dataset I have one copy stage and from that copy stage one link goes to aggregator stage and another link goes to rdup stage.Then the output of aggregation of rdup is join.The aggregator keys ,rdup keys and join keys are same.And I have mentioned the partitioning properly.

The input of aggregatoe is hash partitioned and sorted on the basis of aggregation keys.Now when I am using Aggregator method as Hash the No of records output from the join stage is different from the join output when aggregator method is sort.
Soumya
felixyong
Participant
Posts: 35
Joined: Tue Jul 22, 2003 7:24 pm
Location: Australia

Re: Strange behaviour of aggregator stage

Post by felixyong »

If you sort the data then you should be using "sort" in the Aggregator. Hash is used if you didn't "sort" the data in adv.
soumya5891 wrote:I have a job with the following design.

From dataset I have one copy stage and from that copy stage one link goes to aggregator stage and another link goes to rdup stage.Then the output of aggregation of rdup is join.The aggregator keys ,rdup keys and join keys are same.And I have mentioned the partitioning properly.

The input of aggregatoe is hash partitioned and sorted on the basis of aggregation keys.Now when I am using Aggregator method as Hash the No of records output from the join stage is different from the join output when aggregator method is sort.
Regards
Felix
felixyong
Participant
Posts: 35
Joined: Tue Jul 22, 2003 7:24 pm
Location: Australia

Re: Strange behaviour of aggregator stage

Post by felixyong »

If you sort the data then you should be using "sort" in the Aggregator. Hash is used if you didn't "sort" the data in adv.
soumya5891 wrote:I have a job with the following design.

From dataset I have one copy stage and from that copy stage one link goes to aggregator stage and another link goes to rdup stage.Then the output of aggregation of rdup is join.The aggregator keys ,rdup keys and join keys are same.And I have mentioned the partitioning properly.

The input of aggregatoe is hash partitioned and sorted on the basis of aggregation keys.Now when I am using Aggregator method as Hash the No of records output from the join stage is different from the join output when aggregator method is sort.
Regards
Felix
felixyong
Participant
Posts: 35
Joined: Tue Jul 22, 2003 7:24 pm
Location: Australia

Re: Strange behaviour of aggregator stage

Post by felixyong »

If you sort the data then you should be using "sort" in the Aggregator. Hash is used if you didn't "sort" the data in adv.
soumya5891 wrote:I have a job with the following design.

From dataset I have one copy stage and from that copy stage one link goes to aggregator stage and another link goes to rdup stage.Then the output of aggregation of rdup is join.The aggregator keys ,rdup keys and join keys are same.And I have mentioned the partitioning properly.

The input of aggregatoe is hash partitioned and sorted on the basis of aggregation keys.Now when I am using Aggregator method as Hash the No of records output from the join stage is different from the join output when aggregator method is sort.
Regards
Felix
kandyshandy
Participant
Posts: 597
Joined: Fri Apr 29, 2005 6:19 am
Location: Singapore

Post by kandyshandy »

Soumya,

Read the IBM doc for your issue. It is hardly 1 page and will give you clear picture.

Your way should depend on your data and aggregation requirements!!
Kandy
_________________
Try and Try again…You will succeed atlast!!
Post Reply