uniform Distribution of dataset records at all nodes

adasgupta123 · Post by **adasgupta123** » Thu Nov 09, 2006 5:50 am

Hi,

How can i ensure that records of dataset are evenly distributed
on all the ds server nodes?Do i need to make some changes in

conf file?

Regards

Avik Dasgupta

jhmckeever · Post by **jhmckeever** » Thu Nov 09, 2006 5:54 am

Round Robin partitioning will distribute data evenly regardless of keys. This creates evenly sized partitions and is normally what you get when you select 'Auto' partitioning. This obviously won't support joins or other stages which require matching keys from multiple streams to be present in the same partition.

Random also achieves similar results but has a slightly higher overhead in calculating the random-ness (is that a word?)

These partitioning schemes will use whatever nodes are defined in your APT file (advanced options like node pools aside)

J.

tagnihotri · Post by **tagnihotri** » Thu Nov 09, 2006 7:18 am

But be cautious and do work on how you are going to use these datasets. If for joins\lookups\merge etc do work on how you can save re-partitioning your data and still store it effectively!

jhmckeever wrote:Round Robin partitioning will distribute data evenly regardless of keys. This creates evenly sized partitions and is normally what you get when you select 'Auto' partitioning. This obviously won't support joins or other stages which require matching keys from multiple streams to be present in the same partition.

Random also achieves similar results but has a slightly higher overhead in calculating the random-ness (is that a word?)

These partitioning schemes will use whatever nodes are defined in your APT file (advanced options like node pools aside)

J.