Hash Partitioning

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Madhavan VM
Participant
Posts: 33
Joined: Sat Jul 02, 2005 2:27 am
Location: Bangalore

Hash Partitioning

Post by Madhavan VM »

The question is related to partitioning

From a copy stage,cpy_split, i am splitting the record into two links. one goes to the Aggregator,Agg_sum, and one
pass to a Join stage,Jn_k1.The link which is passed to the Aggregator,Agg_sum is Hashed and sorted on a Key,say,K1.
In Agg_sum we are calculating sum of a column based on key,k1.Now, the link from Aggregator,Agg_sum is joined with
the join,Jn_k1.

The join has two inputs the link.From Aggregator,Agg_sum, and the link from copy stage,cpy_split. The key on which the
records are getting joined is again the key,k1.


The partitioning i have followed for the input to the join is as follows:
From the copy stage,cpy_split, the data is hashed and sorted on key k1
From the Aggregator,Agg_sum the data is same partitioned as before entering the Aggregator i have hashed and sorted
on the key,k1.

In the output one set of data was missing. :!:

Let the output from Aggregator be:

k1
"1","310","4"
"3","342","4"
"4","887","5"
"2","543","4"

Let the output from copy stage be:

k1
"1","X","0001","00000000123","210","349"
"1","X","0002","00000000114","110","349"
"1","X","0003","00000000272","310","349"
"1","X","0004","00000000256","121","349"
"3","X","0001","00000000364","342","147"
"3","X","0002","00000000489","111","147"
"3","X","0003","00000000954","222","147"
"3","X","0004","00000000326","333","147"
"2","X","0001","00000000895","432","247"
"2","X","0002","00000000654","543","247"
"2","X","0003","00000000156","123","247"
"2","X","0004","00000000258","232","247"
"4","X","0001","00000000195","444","47"
"4","X","0002","00000000659","555","47"
"4","X","0003","00000000485","666","47"
"4","X","0004","00000000147","777","47"
"4","X","0004","00000000147","887","47"


The output from the join stage is:
k1
"1","310","4","X","0001","00000000123","210","349","100","489"
"1","310","4","X","0003","00000000272","310","349","0","489"
"1","310","4","X","0004","00000000256","121","349","189","489"
"1","310","4","X","0002","00000000114","110","349","200","489"
"3","342","4","X","0001","00000000364","342","147","0","360"
"3","342","4","X","0004","00000000326","333","147","9","360"
"3","342","4","X","0003","00000000954","222","147","120","360"
"3","342","4","X","0002","00000000489","111","147","231","360"
"4","887","5","X","0003","00000000485","666","47","221","1106"
"4","887","5","X","0004","00000000147","777","47","110","1106"
"4","887","5","X","0001","00000000195","444","47","443","1106"
"4","887","5","X","0002","00000000659","555","47","332","1106"
"4","887","5","X","0004","00000000147","887","47","0","1106"

If you observe the data you can find that the value for Key,k1=2 is not found on the output.

But, If I change the partitioning as below i am getting all the rows as output:
The change is as follows:

The partitioning i have changed for the input to the join is as follows:
From the copy stage,cpy_split, the data is hashed and sorted on key k1(same as before)
From the Aggregator,Agg_sum the data is same partitioned and sorted on key k1.

Could someone tell me what the issue is and more about hash partitioning? :?:
warm regards,
Ajith GK
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Immediately when I see input of 1,1,1,1,3,3,3,3,2,2,2,2,4,4,4 results in an output from the join of 1,3,4 I would look at sorting problems. The join stage needs both links to be sorted and usually adds a join if one does not exist. Check your symbols on your links to see if the sort appears on both join input links. Why don't you sort before your aggregation? Output from aggregation is 1,3,2,4. Why not get the 1,2,3,4 order correct from the beginning and sort and partition by k1 all the way through.
elavenil
Premium Member
Premium Member
Posts: 467
Joined: Thu Jan 31, 2002 10:20 pm
Location: Singapore

Post by elavenil »

As suggessted by Vincent, sort & partition could cause this issue. Pls look at the generated OSH code to identify the sort and partition that are used in the aggregator and join stage. And use the right partition and sort the data that will help you in resolving the issue.

Regards
Saravanan
Post Reply