Folks,
i'm processing million records through sort and aggregator stage.
I did sort on 4 keys in ascending order in sort stage and Group by 4 keys and hash partitoned on 4 keys in Aggregator stage.
i'm using sort method in Aggregator.
If i ran job 4 times with same input dataset and i see 4 different amount of records are coming from aggregator.
Any idea ..?
Sorting keys and Partitioning Keys are in same order only.
Say for ex if i ran for only specific key then i get excat records and countts also match.if i ran for a month(million records) then i see difference.