Dataset created on only one node

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
srinivas.nettalam
Participant
Posts: 134
Joined: Tue Jun 15, 2010 2:10 am
Location: Bangalore

Dataset created on only one node

Post by srinivas.nettalam »

I have Seq.file as the source then copy stage and a dataset.The partition in both copy and dataset is "Auto",I observed the dataset is created on only 1 node though the job ran on 4 nodes.I assumed that copy stage invokes round robin by default and the records would be distributed among the 4 nodes.Is there a specific reason for this behaviour.Please let me know
N.Srinivas
India.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Is auto partitioning disabled in your environment? $APT_NO_PART_INSERTION=1

Or, the copy stage was probably optimized out by the engine at submission. In that case, probably no partitioner was inserted in front of the dataset stage when the job ran and therefore the data was not repartitioned. You can specify the partitioning at the input of the copy or dataset stages to resolve this.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For a sufficiently small volume of data (either < 32KB or < 128KB, I can't recall which) a Data Set will only be created on one node - there's no point in splitting the data since DataStage moves data around in chunks of not less than 32KB.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
srinivas.nettalam
Participant
Posts: 134
Joined: Tue Jun 15, 2010 2:10 am
Location: Bangalore

Post by srinivas.nettalam »

jwiles wrote:Is auto partitioning disabled in your environment? $APT_NO_PART_INSERTION=1

Or, the copy stage was probably optimized out by the engine at submission. In that case, probably no partitioner was inserted in front of the dataset stage when the job ran and therefore the data was not repartitioned. You can specify the partitioning at the input of the copy or dataset stages to resolve this.

Regards,
When I set the parition to round robin in copy stage then the dataset is created on all the nodes for the same data
N.Srinivas
India.
Post Reply