Hashing keys and grouping columns

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
meet_deb85
Premium Member
Premium Member
Posts: 132
Joined: Tue Sep 04, 2007 11:38 am
Location: NOIDA

Hashing keys and grouping columns

Post by meet_deb85 »

Hi All,
I am having one confusion in job.......like in some of the jobs prior to aggrigator stage data is hash partitioned on lets say A,B & C columns and in aggrigator,grouping is done on A,B,C,D(where D is not constent).
Will the result correct and what will be the impact on performance ??

thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What's your confusion?

If data are partitioned on A, B and C then - for any particular combination of A, B and C - all values of D will be on the one node, so grouping by A, B, C and D will yield accurate results.

Performance is immaterial, there's only one way to get the correct result, namely grouping by A, B, C and D (though partitioning on A alone would probably work as well). If data are sorted by A (and maybe then by B and C) then the Sort method for the Aggregator stage will probably finish faster than the Hash method for reasonable volumes of data.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
meet_deb85
Premium Member
Premium Member
Posts: 132
Joined: Tue Sep 04, 2007 11:38 am
Location: NOIDA

Post by meet_deb85 »

Thnaks Ray......
Post Reply