Sequential run

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
goutam
Premium Member
Premium Member
Posts: 109
Joined: Thu Jul 26, 2007 6:53 am

Sequential run

Post by goutam »

Which two would cause the stage to sequentially process it's incoming data.

1.Execution mode of the stage is sequential.
2.The stage follows a sequential file stage and it's preserve partitioning has been set to clear.
3.The stage follows a sequential file stage and it's partitioning type is set to auto.
4.The stage has a constarint with a node pool containing one Node.
Goutam Sahoo
DSDexter
Participant
Posts: 94
Joined: Wed Jul 11, 2007 9:36 pm
Location: Pune,India

Post by DSDexter »

1 and 4.

When is the interview? :D
Thanks
DSDexter
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Looks like a question from a (practice?) certification exam to me.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
goutam
Premium Member
Premium Member
Posts: 109
Joined: Thu Jul 26, 2007 6:53 am

Post by goutam »

DSDexter wrote:1 and 4.

When is the interview? :D
Can You please expalin why option 4? can't we achieve pipeline parallelism by contrainting a stage to one node.
Goutam Sahoo
DSDexter
Participant
Posts: 94
Joined: Wed Jul 11, 2007 9:36 pm
Location: Pune,India

Post by DSDexter »

Look at the question you asked, You asked me Sequential processing. And I have answered this question taking this key word into consideration.
Thanks
DSDexter
goutam
Premium Member
Premium Member
Posts: 109
Joined: Thu Jul 26, 2007 6:53 am

Post by goutam »

DSDexter wrote:Look at the question you asked, You asked me Sequential processing. And I have answered this question taking this key word into consideration.
Ok..Can you please tell why did you choose option 4? I want to know the reason behind it.
Goutam Sahoo
DSDexter
Participant
Posts: 94
Joined: Wed Jul 11, 2007 9:36 pm
Location: Pune,India

Post by DSDexter »

The processing of data on a single node is always sequential. If you 2 nodes the job will run in parallel, But the two nodes will process data sequentially.

So if you are constratinting a stage to a node-pool, There will be no instantition of the stage and only a single Orchestrate operator will be generated for that stage. which in turn will generate a single unix process.


Refer datastage documentation for more details on Data partitioning and parallelism.
Thanks
DSDexter
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Because sequential processing is what you get on one node; a stage (operator) can create only one process on one processing node (unless it's a composite operator, which possibility was not canvassed in the question).

Pipeline parallelism is between stages (operators), not within one stage, therefore is outside the scope of the question.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
goutam
Premium Member
Premium Member
Posts: 109
Joined: Thu Jul 26, 2007 6:53 am

Post by goutam »

DSDexter wrote:The processing of data on a single node is always sequential. If you 2 nodes the job will run in parallel, But the two nodes will process data sequentially.

So if you are constratinting a stage to a node-pool, There will be no instantition of the stage and only a single Orchestrate operator will be generated for that stage. which in turn will generate a single unix process.


Refer datastage documentation for more details on Data partitioning and parallelism.

When the job run in two node config file, partition as well as pipeline parallelism comes ino picture.There will be 2 unix processes for each stage or operator. Now Each UNIX process will process the incoming data in parallel(Here pipeline parallelism applied). How the data is processed sequentially??? The stage sequentially operates the data when the stage itself set to sequential mode in the stage property and not when it is constrained with node pool having one node. Ray..Please correct me if i am wrong.

Another point is that when a parallel job is run with 1 node config file , will the parallel job run in parallel??? if yes , how do we get benifit from a parallel job ???
Goutam Sahoo
DSDexter
Participant
Posts: 94
Joined: Wed Jul 11, 2007 9:36 pm
Location: Pune,India

Post by DSDexter »

I think Ray's reply should resolve this topic now. There's nothing more left to explain.

Thanks Ray.....Your answers are always the ultimate ones. 8)
Thanks
DSDexter
goutam
Premium Member
Premium Member
Posts: 109
Joined: Thu Jul 26, 2007 6:53 am

Post by goutam »

DSDexter wrote:I think Ray's reply should resolve this topic now. There's nothing more left to explain.

Thanks Ray.....Your answers are always the ultimate ones. 8)

DSDexter....please please correct me if i am wrong.....
Goutam Sahoo
DSDexter
Participant
Posts: 94
Joined: Wed Jul 11, 2007 9:36 pm
Location: Pune,India

Post by DSDexter »

Goutam Lets make life simpler, I am doing nothing but reframing Ray's idealogy.

Data pipelining only comes into picture when there is flow of data from one stage(operator) to another. It doesnt come into picture for a single stage.

Regarding your second question. When you run a job on single. You can only achieve pipeline parallelism. Thats the benefit that you get.
Thanks
DSDexter
Roopanwita
Participant
Posts: 125
Joined: Mon Sep 11, 2006 4:22 am
Location: India

Post by Roopanwita »

DSDexter wrote:Goutam Lets make life simpler, I am doing nothing but reframing Ray's idealogy.

Data pipelining only comes into picture when there is flow of data from one stage(operator) to another. It doesnt come into picture for a single stage.

Regarding your second question. When you run a job on single. You can only achieve pipeline parallelism. Thats the benefit that you get.
Your reply is also upto the point.. :)
When you have one node system, records will process sequentially...

Thanks!
goutam
Premium Member
Premium Member
Posts: 109
Joined: Thu Jul 26, 2007 6:53 am

Post by goutam »

DSDexter wrote:Goutam Lets make life simpler, I am doing nothing but reframing Ray's idealogy.

Data pipelining only comes into picture when there is flow of data from one stage(operator) to another. It doesnt come into picture for a single stage.

Regarding your second question. When you run a job on single. You can only achieve pipeline parallelism. Thats the benefit that you get.
DSDexter,

I agree with your 2nd question's reply.
But in case of 1st question's reply, how a job possible with only one stage. There must be more than one stage in a parallel job.
Last edited by goutam on Fri Jul 18, 2008 6:24 am, edited 1 time in total.
Goutam Sahoo
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Example: A job that uses a Surrogate Key Generator stage to initialize a state file has no links and therefore has only one stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply