Page 1 of 1

why is external filter not running in parallel

Posted: Wed Aug 01, 2007 7:31 am
by evans036
even after setting the stage->advanced->execution mode = 'parallel' i cannot get my external filter to run in parallel.

is this a known issue/bug?

thanks,

steve

Posted: Wed Aug 01, 2007 7:47 am
by balajisr
You are right.

I tried giving an explicit partitioning method (Round Robin). My job aborted with the following message:
Error when checking operator: Input data set on port 0 has a partition method, but the operator is not parallel.
Manual specifies that it can run both sequential and parallel.Looks like a bug.

By default(Auto) it runs sequentially.

Posted: Wed Aug 01, 2007 7:52 am
by evans036
thanks balajisr for the info.

is anyone using a version of datastage where the external filter does run in parallel?

i'm on v7.5.2

thanks,

steve

Posted: Thu Aug 02, 2007 7:52 am
by evans036
i have logged a case with IBM on this.

i also note that the 'external target' stage does not parallelize either

i will keep you all posted

thanks,
steve

Posted: Thu Aug 02, 2007 8:24 am
by JoshGeorge
Even if you explicitly specify "Parallel" this operator runs in sequential mode when you try to run any OS commands. A potential bug I think. This topic was covered previously. You can see one of the topics HERE

Posted: Thu Aug 02, 2007 3:29 pm
by ray.wurlod
It's not necessarily a bug. Some stage types simply don't have parallel execution capability. Others that do have their default execution mode as Sequential, usually for obvious reasons. For example a Sequential File stage with an input link and only one File named can only generate one process - this is an operating system restriction, not a DataStage one.

However, the External Filter stage can operate in parallel execution mode unless there are features of the job design that prevent this. Go to the Advanced tab of the stage properties; in the leftmost frame you can set the execution mode. Whether it then can execute in parallel will again depend on the external command specified. For example cat - (which performs no filtering whatsoever) will happily run in parallel.

Posted: Mon Aug 06, 2007 5:51 am
by evans036
IBM have been looking at this issue and they have been able to recreate in their labs.

so i guess this is a bug

their suggested workaround (which i need to look into) is to use a wrapped stage

if i get further info i will post it here

DS version is 7.5.2

thanks,

steve

Posted: Thu Aug 23, 2007 6:17 am
by evans036
fyi...

IBM have sent me a patch for this problem. i have not had good experiences with dataStage patches with IBM so i might stick to the wrapped stage for now and not apply the patch.

thanks,

steve

Posted: Fri Oct 19, 2007 10:49 am
by ivannavi
ray.wurlod said
For example cat - (which performs no filtering whatsoever) will happily run in parallel.


I read a bunch of *.csv files using External Source stage with cat /path/*.
The link from this is forwarded to a peek stage.
In the monitor in Director Peek shows three nodes (as per configuration file), but the External Source stage runs sequential.
Is this the bug that evans036 mentions a patch being available for?
Or am I doing something wrong?

Posted: Fri Oct 19, 2007 2:02 pm
by evans036
Is this the bug that evans036 mentions a patch being available for?
It might be related. I was specifically using the 'external filter' which i can imagine would have common functionality.

in your case though, parallelism would force your csv file to be read in its entirety once for each parallel stream. You would also need to be sure to deal with dups if the csv is a primary input.

Maybe that's not what u want?

good luck,
steve

Posted: Mon Oct 22, 2007 2:16 am
by ivannavi
I was hoping each node would pick up its own set of files.
An example of a command or a program (what are the prerequisites) that External Source stage expects in order to run in parallel would be great.
If anyone has it working...