Partitioning in Filter stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
visvacfirvin
Premium Member
Premium Member
Posts: 49
Joined: Fri Dec 14, 2007 1:43 pm

Partitioning in Filter stage

Post by visvacfirvin »

Hi,
I need a clarification regarding Partitions on Filter stage.

For eg consider the following set of records.

1,NY
2,NJ
3,NJ
4,NY
5,NJ
6,NY

Now i want to filter all the records from NY using filter stage(using two node config file). How does the partitioning works in the following cases.

1. Auto Partition - Will Filter stage uses the filter columns to partition the records.
2. If I explicitly set the partition as Hash Partition on state name, will the performance be improved? As the records from NY move to one node and NJ to another node, will the system knows not to apply filter on the node which has NJ?
3. Setting partition on Serial no affects performance?


Thanks,
Firvin
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

With only six rows nothing you do will make a lot of difference.

What partitioning (Auto) uses depends on what's upstream of the Filter stage. Hash partitioning may worsen performance if it causes your data to be skewed (you have, for example, many more NY than NJ). The Filter stage does not use the partitioning algorithm in its filtering calculations; it will always check all the WHERE conditions. Sequential execution (which is what I assume you mean by "serial") will not improve anything. Data do not need to be key partitioned for the Filter stage, so Round Robin will give the most equitable balance of rows over available processing nodes. However, if a downstream stage does require key-partitioned data, then effecting this as far upstream as possible will minimize the need for subsequent re-partitioning.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply