Sequential run

goutam · Post by **goutam** » Fri Jul 18, 2008 1:32 am

Which two would cause the stage to sequentially process it's incoming data.

1.Execution mode of the stage is sequential.
2.The stage follows a sequential file stage and it's preserve partitioning has been set to clear.
3.The stage follows a sequential file stage and it's partitioning type is set to auto.
4.The stage has a constarint with a node pool containing one Node.

DSDexter · Post by **DSDexter** » Fri Jul 18, 2008 1:35 am

1 and 4.

When is the interview? :D

ray.wurlod · Post by **ray.wurlod** » Fri Jul 18, 2008 1:37 am

Looks like a question from a (practice?) certification exam to me.

goutam · Post by **goutam** » Fri Jul 18, 2008 1:41 am

DSDexter wrote:1 and 4.

When is the interview? :D

Can You please expalin why option 4? can't we achieve pipeline parallelism by contrainting a stage to one node.

DSDexter · Post by **DSDexter** » Fri Jul 18, 2008 1:48 am

Look at the question you asked, You asked me Sequential processing. And I have answered this question taking this key word into consideration.

goutam · Post by **goutam** » Fri Jul 18, 2008 2:32 am

DSDexter wrote:Look at the question you asked, You asked me Sequential processing. And I have answered this question taking this key word into consideration.

Ok..Can you please tell why did you choose option 4? I want to know the reason behind it.

DSDexter · Post by **DSDexter** » Fri Jul 18, 2008 3:26 am

The processing of data on a single node is always sequential. If you 2 nodes the job will run in parallel, But the two nodes will process data sequentially.

So if you are constratinting a stage to a node-pool, There will be no instantition of the stage and only a single Orchestrate operator will be generated for that stage. which in turn will generate a single unix process.

Refer datastage documentation for more details on Data partitioning and parallelism.

ray.wurlod · Post by **ray.wurlod** » Fri Jul 18, 2008 3:26 am

Because sequential processing is what you get on one node; a stage (operator) can create only one process on one processing node (unless it's a composite operator, which possibility was not canvassed in the question).

Pipeline parallelism is between stages (operators), not within one stage, therefore is outside the scope of the question.

goutam · Post by **goutam** » Fri Jul 18, 2008 4:00 am

DSDexter wrote:The processing of data on a single node is always sequential. If you 2 nodes the job will run in parallel, But the two nodes will process data sequentially.

So if you are constratinting a stage to a node-pool, There will be no instantition of the stage and only a single Orchestrate operator will be generated for that stage. which in turn will generate a single unix process.

Refer datastage documentation for more details on Data partitioning and parallelism.

When the job run in two node config file, partition as well as pipeline parallelism comes ino picture.There will be 2 unix processes for each stage or operator. Now Each UNIX process will process the incoming data in parallel(Here pipeline parallelism applied). How the data is processed sequentially??? The stage sequentially operates the data when the stage itself set to sequential mode in the stage property and not when it is constrained with node pool having one node. Ray..Please correct me if i am wrong.

Another point is that when a parallel job is run with 1 node config file , will the parallel job run in parallel??? if yes , how do we get benifit from a parallel job ???

DSDexter · Post by **DSDexter** » Fri Jul 18, 2008 4:18 am

I think Ray's reply should resolve this topic now. There's nothing more left to explain.

Thanks Ray.....Your answers are always the ultimate ones.

goutam · Post by **goutam** » Fri Jul 18, 2008 4:52 am

DSDexter wrote:I think Ray's reply should resolve this topic now. There's nothing more left to explain.

Thanks Ray.....Your answers are always the ultimate ones.

DSDexter....please please correct me if i am wrong.....

DSDexter · Post by **DSDexter** » Fri Jul 18, 2008 5:08 am

Goutam Lets make life simpler, I am doing nothing but reframing Ray's idealogy.

Data pipelining only comes into picture when there is flow of data from one stage(operator) to another. It doesnt come into picture for a single stage.

Regarding your second question. When you run a job on single. You can only achieve pipeline parallelism. Thats the benefit that you get.

Roopanwita · Post by **Roopanwita** » Fri Jul 18, 2008 6:10 am

DSDexter wrote:Goutam Lets make life simpler, I am doing nothing but reframing Ray's idealogy.

Data pipelining only comes into picture when there is flow of data from one stage(operator) to another. It doesnt come into picture for a single stage.

Regarding your second question. When you run a job on single. You can only achieve pipeline parallelism. Thats the benefit that you get.

Your reply is also upto the point..

When you have one node system, records will process sequentially...

Thanks!

goutam · Post by **goutam** » Fri Jul 18, 2008 6:18 am

DSDexter wrote:Goutam Lets make life simpler, I am doing nothing but reframing Ray's idealogy.

Data pipelining only comes into picture when there is flow of data from one stage(operator) to another. It doesnt come into picture for a single stage.

Regarding your second question. When you run a job on single. You can only achieve pipeline parallelism. Thats the benefit that you get.

DSDexter,

I agree with your 2nd question's reply.
But in case of 1st question's reply, how a job possible with only one stage. There must be more than one stage in a parallel job.

ray.wurlod · Post by **ray.wurlod** » Fri Jul 18, 2008 6:21 am

Example: A job that uses a Surrogate Key Generator stage to initialize a state file has no links and therefore has only one stage.