From a copy stage,cpy_split, i am splitting the record into two links. one goes to the Aggregator,Agg_sum, and one
pass to a Join stage,Jn_k1.The link which is passed to the Aggregator,Agg_sum is Hashed and sorted on a Key,say,K1.
In Agg_sum we are calculating sum of a column based on key,k1.Now, the link from Aggregator,Agg_sum is joined with
the join,Jn_k1.
The join has two inputs the link.From Aggregator,Agg_sum, and the link from copy stage,cpy_split. The key on which the
records are getting joined is again the key,k1.
The partitioning i have followed for the input to the join is as follows:
From the copy stage,cpy_split, the data is hashed and sorted on key k1
From the Aggregator,Agg_sum the data is same partitioned as before entering the Aggregator i have hashed and sorted
on the key,k1.
In the output one set of data was missing.
![Exclamation :!:](./images/smilies/icon_exclaim.gif)
Let the output from Aggregator be:
k1
"1","310","4"
"3","342","4"
"4","887","5"
"2","543","4"
Let the output from copy stage be:
k1
"1","X","0001","00000000123","210","349"
"1","X","0002","00000000114","110","349"
"1","X","0003","00000000272","310","349"
"1","X","0004","00000000256","121","349"
"3","X","0001","00000000364","342","147"
"3","X","0002","00000000489","111","147"
"3","X","0003","00000000954","222","147"
"3","X","0004","00000000326","333","147"
"2","X","0001","00000000895","432","247"
"2","X","0002","00000000654","543","247"
"2","X","0003","00000000156","123","247"
"2","X","0004","00000000258","232","247"
"4","X","0001","00000000195","444","47"
"4","X","0002","00000000659","555","47"
"4","X","0003","00000000485","666","47"
"4","X","0004","00000000147","777","47"
"4","X","0004","00000000147","887","47"
The output from the join stage is:
k1
"1","310","4","X","0001","00000000123","210","349","100","489"
"1","310","4","X","0003","00000000272","310","349","0","489"
"1","310","4","X","0004","00000000256","121","349","189","489"
"1","310","4","X","0002","00000000114","110","349","200","489"
"3","342","4","X","0001","00000000364","342","147","0","360"
"3","342","4","X","0004","00000000326","333","147","9","360"
"3","342","4","X","0003","00000000954","222","147","120","360"
"3","342","4","X","0002","00000000489","111","147","231","360"
"4","887","5","X","0003","00000000485","666","47","221","1106"
"4","887","5","X","0004","00000000147","777","47","110","1106"
"4","887","5","X","0001","00000000195","444","47","443","1106"
"4","887","5","X","0002","00000000659","555","47","332","1106"
"4","887","5","X","0004","00000000147","887","47","0","1106"
If you observe the data you can find that the value for Key,k1=2 is not found on the output.
But, If I change the partitioning as below i am getting all the rows as output:
The change is as follows:
The partitioning i have changed for the input to the join is as follows:
From the copy stage,cpy_split, the data is hashed and sorted on key k1(same as before)
From the Aggregator,Agg_sum the data is same partitioned and sorted on key k1.
Could someone tell me what the issue is and more about hash partitioning?
![Question :?:](./images/smilies/icon_question.gif)