uniform Distribution of dataset records at all nodes

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
adasgupta123
Participant
Posts: 42
Joined: Fri Oct 20, 2006 1:58 am

uniform Distribution of dataset records at all nodes

Post by adasgupta123 »

Hi,

How can i ensure that records of dataset are evenly distributed
on all the ds server nodes?Do i need to make some changes in

conf file?

Regards

Avik Dasgupta
jhmckeever
Premium Member
Premium Member
Posts: 301
Joined: Thu Jul 14, 2005 10:27 am
Location: Melbourne, Australia
Contact:

Post by jhmckeever »

Round Robin partitioning will distribute data evenly regardless of keys. This creates evenly sized partitions and is normally what you get when you select 'Auto' partitioning. This obviously won't support joins or other stages which require matching keys from multiple streams to be present in the same partition.

Random also achieves similar results but has a slightly higher overhead in calculating the random-ness (is that a word?)

These partitioning schemes will use whatever nodes are defined in your APT file (advanced options like node pools aside)

J.
<b>John McKeever</b>
Data Migrators
<b><a href="https://www.mettleci.com">MettleCI</a> - DevOps for DataStage</b>
<a href="http://www.datamigrators.com/"><img src="https://www.datamigrators.com/assets/im ... l.png"></a>
tagnihotri
Participant
Posts: 83
Joined: Sat Oct 28, 2006 6:25 am

Post by tagnihotri »

But be cautious and do work on how you are going to use these datasets. If for joins\lookups\merge etc do work on how you can save re-partitioning your data and still store it effectively!

jhmckeever wrote:Round Robin partitioning will distribute data evenly regardless of keys. This creates evenly sized partitions and is normally what you get when you select 'Auto' partitioning. This obviously won't support joins or other stages which require matching keys from multiple streams to be present in the same partition.

Random also achieves similar results but has a slightly higher overhead in calculating the random-ness (is that a word?)

These partitioning schemes will use whatever nodes are defined in your APT file (advanced options like node pools aside)

J.
Post Reply