Page 1 of 1

Dataset created on only one node

Posted: Thu Jun 09, 2011 5:43 am
by srinivas.nettalam
I have Seq.file as the source then copy stage and a dataset.The partition in both copy and dataset is "Auto",I observed the dataset is created on only 1 node though the job ran on 4 nodes.I assumed that copy stage invokes round robin by default and the records would be distributed among the 4 nodes.Is there a specific reason for this behaviour.Please let me know

Posted: Thu Jun 09, 2011 7:11 am
by jwiles
Is auto partitioning disabled in your environment? $APT_NO_PART_INSERTION=1

Or, the copy stage was probably optimized out by the engine at submission. In that case, probably no partitioner was inserted in front of the dataset stage when the job ran and therefore the data was not repartitioned. You can specify the partitioning at the input of the copy or dataset stages to resolve this.

Regards,

Posted: Thu Jun 09, 2011 4:37 pm
by ray.wurlod
For a sufficiently small volume of data (either < 32KB or < 128KB, I can't recall which) a Data Set will only be created on one node - there's no point in splitting the data since DataStage moves data around in chunks of not less than 32KB.

Posted: Sun Jun 12, 2011 11:40 pm
by srinivas.nettalam
jwiles wrote:Is auto partitioning disabled in your environment? $APT_NO_PART_INSERTION=1

Or, the copy stage was probably optimized out by the engine at submission. In that case, probably no partitioner was inserted in front of the dataset stage when the job ran and therefore the data was not repartitioned. You can specify the partitioning at the input of the copy or dataset stages to resolve this.

Regards,
When I set the parition to round robin in copy stage then the dataset is created on all the nodes for the same data