Sequential run
Moderators: chulett, rschirm, roy
Sequential run
Which two would cause the stage to sequentially process it's incoming data.
1.Execution mode of the stage is sequential.
2.The stage follows a sequential file stage and it's preserve partitioning has been set to clear.
3.The stage follows a sequential file stage and it's partitioning type is set to auto.
4.The stage has a constarint with a node pool containing one Node.
1.Execution mode of the stage is sequential.
2.The stage follows a sequential file stage and it's preserve partitioning has been set to clear.
3.The stage follows a sequential file stage and it's partitioning type is set to auto.
4.The stage has a constarint with a node pool containing one Node.
Goutam Sahoo
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The processing of data on a single node is always sequential. If you 2 nodes the job will run in parallel, But the two nodes will process data sequentially.
So if you are constratinting a stage to a node-pool, There will be no instantition of the stage and only a single Orchestrate operator will be generated for that stage. which in turn will generate a single unix process.
Refer datastage documentation for more details on Data partitioning and parallelism.
So if you are constratinting a stage to a node-pool, There will be no instantition of the stage and only a single Orchestrate operator will be generated for that stage. which in turn will generate a single unix process.
Refer datastage documentation for more details on Data partitioning and parallelism.
Thanks
DSDexter
DSDexter
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Because sequential processing is what you get on one node; a stage (operator) can create only one process on one processing node (unless it's a composite operator, which possibility was not canvassed in the question).
Pipeline parallelism is between stages (operators), not within one stage, therefore is outside the scope of the question.
Pipeline parallelism is between stages (operators), not within one stage, therefore is outside the scope of the question.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSDexter wrote:The processing of data on a single node is always sequential. If you 2 nodes the job will run in parallel, But the two nodes will process data sequentially.
So if you are constratinting a stage to a node-pool, There will be no instantition of the stage and only a single Orchestrate operator will be generated for that stage. which in turn will generate a single unix process.
Refer datastage documentation for more details on Data partitioning and parallelism.
When the job run in two node config file, partition as well as pipeline parallelism comes ino picture.There will be 2 unix processes for each stage or operator. Now Each UNIX process will process the incoming data in parallel(Here pipeline parallelism applied). How the data is processed sequentially??? The stage sequentially operates the data when the stage itself set to sequential mode in the stage property and not when it is constrained with node pool having one node. Ray..Please correct me if i am wrong.
Another point is that when a parallel job is run with 1 node config file , will the parallel job run in parallel??? if yes , how do we get benifit from a parallel job ???
Goutam Sahoo
Goutam Lets make life simpler, I am doing nothing but reframing Ray's idealogy.
Data pipelining only comes into picture when there is flow of data from one stage(operator) to another. It doesnt come into picture for a single stage.
Regarding your second question. When you run a job on single. You can only achieve pipeline parallelism. Thats the benefit that you get.
Data pipelining only comes into picture when there is flow of data from one stage(operator) to another. It doesnt come into picture for a single stage.
Regarding your second question. When you run a job on single. You can only achieve pipeline parallelism. Thats the benefit that you get.
Thanks
DSDexter
DSDexter
-
- Participant
- Posts: 125
- Joined: Mon Sep 11, 2006 4:22 am
- Location: India
Your reply is also upto the point..DSDexter wrote:Goutam Lets make life simpler, I am doing nothing but reframing Ray's idealogy.
Data pipelining only comes into picture when there is flow of data from one stage(operator) to another. It doesnt come into picture for a single stage.
Regarding your second question. When you run a job on single. You can only achieve pipeline parallelism. Thats the benefit that you get.
![Smile :)](./images/smilies/icon_smile.gif)
When you have one node system, records will process sequentially...
Thanks!
DSDexter,DSDexter wrote:Goutam Lets make life simpler, I am doing nothing but reframing Ray's idealogy.
Data pipelining only comes into picture when there is flow of data from one stage(operator) to another. It doesnt come into picture for a single stage.
Regarding your second question. When you run a job on single. You can only achieve pipeline parallelism. Thats the benefit that you get.
I agree with your 2nd question's reply.
But in case of 1st question's reply, how a job possible with only one stage. There must be more than one stage in a parallel job.
Last edited by goutam on Fri Jul 18, 2008 6:24 am, edited 1 time in total.
Goutam Sahoo
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: