Page 1 of 1

uniform Distribution of dataset records at all nodes

Posted: Thu Nov 09, 2006 5:50 am
by adasgupta123
Hi,

How can i ensure that records of dataset are evenly distributed
on all the ds server nodes?Do i need to make some changes in

conf file?

Regards

Avik Dasgupta

Posted: Thu Nov 09, 2006 5:54 am
by jhmckeever
Round Robin partitioning will distribute data evenly regardless of keys. This creates evenly sized partitions and is normally what you get when you select 'Auto' partitioning. This obviously won't support joins or other stages which require matching keys from multiple streams to be present in the same partition.

Random also achieves similar results but has a slightly higher overhead in calculating the random-ness (is that a word?)

These partitioning schemes will use whatever nodes are defined in your APT file (advanced options like node pools aside)

J.

Posted: Thu Nov 09, 2006 7:18 am
by tagnihotri
But be cautious and do work on how you are going to use these datasets. If for joins\lookups\merge etc do work on how you can save re-partitioning your data and still store it effectively!

jhmckeever wrote:Round Robin partitioning will distribute data evenly regardless of keys. This creates evenly sized partitions and is normally what you get when you select 'Auto' partitioning. This obviously won't support joins or other stages which require matching keys from multiple streams to be present in the same partition.

Random also achieves similar results but has a slightly higher overhead in calculating the random-ness (is that a word?)

These partitioning schemes will use whatever nodes are defined in your APT file (advanced options like node pools aside)

J.