Page 1 of 1

One stage is sequential in parallel jon

Posted: Thu Feb 23, 2012 9:17 am
by vishu19aug
hi,

In my parallel job one stage (transformer) is sequential. Does it have any negative impact on performance? will it be bottleneck for the complete prallel flow?

Thanks,
Vishal

Posted: Thu Feb 23, 2012 9:31 am
by mandyli
Hi

Have you developed and tested with any data ?

without knowing anything how we will say about performance.

Is this is simple and low number of rows or huge?

Thanks
Man

Posted: Thu Feb 23, 2012 11:06 am
by vishu19aug
Yes i did but with very small data. However, my job will process around 1 million records.

Posted: Thu Feb 23, 2012 12:39 pm
by jwiles
Is the transformer sequential for a certain reason? Does it HAVE to be sequential in order to meet business logic requirements?

Any sequentially operating stage in a parallel job has the potential to be a bottleneck depending upon the job design and data being processed, but that doesn't mean they will be.

Regards,

Posted: Thu Feb 23, 2012 2:12 pm
by vishu19aug
I have the following logic to compare the name of two records. Where the file is sorted on name and rownumber, because there can be two groups same name but sandwitched with another group of name. THe below does not work correctly if the stage is parallel

StageVar1 = Input.name
StageVar2 = (if StageVar1 = StageVar3 then StageVar4 else StageVar4 + 1)
StageVar3 = StageVar1
StageVar4 = StageVar2

Posted: Thu Feb 23, 2012 3:06 pm
by ray.wurlod
Get the partitioning right and it WILL work properly in parallel. A sequential stage in a parallel job will always be a bottleneck.