Dataset created on only one node
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 134
- Joined: Tue Jun 15, 2010 2:10 am
- Location: Bangalore
Dataset created on only one node
I have Seq.file as the source then copy stage and a dataset.The partition in both copy and dataset is "Auto",I observed the dataset is created on only 1 node though the job ran on 4 nodes.I assumed that copy stage invokes round robin by default and the records would be distributed among the 4 nodes.Is there a specific reason for this behaviour.Please let me know
N.Srinivas
India.
India.
Is auto partitioning disabled in your environment? $APT_NO_PART_INSERTION=1
Or, the copy stage was probably optimized out by the engine at submission. In that case, probably no partitioner was inserted in front of the dataset stage when the job ran and therefore the data was not repartitioned. You can specify the partitioning at the input of the copy or dataset stages to resolve this.
Regards,
Or, the copy stage was probably optimized out by the engine at submission. In that case, probably no partitioner was inserted in front of the dataset stage when the job ran and therefore the data was not repartitioned. You can specify the partitioning at the input of the copy or dataset stages to resolve this.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
For a sufficiently small volume of data (either < 32KB or < 128KB, I can't recall which) a Data Set will only be created on one node - there's no point in splitting the data since DataStage moves data around in chunks of not less than 32KB.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 134
- Joined: Tue Jun 15, 2010 2:10 am
- Location: Bangalore
When I set the parition to round robin in copy stage then the dataset is created on all the nodes for the same datajwiles wrote:Is auto partitioning disabled in your environment? $APT_NO_PART_INSERTION=1
Or, the copy stage was probably optimized out by the engine at submission. In that case, probably no partitioner was inserted in front of the dataset stage when the job ran and therefore the data was not repartitioned. You can specify the partitioning at the input of the copy or dataset stages to resolve this.
Regards,
N.Srinivas
India.
India.