Page 1 of 1
why is external filter not running in parallel
Posted: Wed Aug 01, 2007 7:31 am
by evans036
even after setting the stage->advanced->execution mode = 'parallel' i cannot get my external filter to run in parallel.
is this a known issue/bug?
thanks,
steve
Posted: Wed Aug 01, 2007 7:47 am
by balajisr
You are right.
I tried giving an explicit partitioning method (Round Robin). My job aborted with the following message:
Error when checking operator: Input data set on port 0 has a partition method, but the operator is not parallel.
Manual specifies that it can run both sequential and parallel.Looks like a bug.
By default(Auto) it runs sequentially.
Posted: Wed Aug 01, 2007 7:52 am
by evans036
thanks balajisr for the info.
is anyone using a version of datastage where the external filter does run in parallel?
i'm on v7.5.2
thanks,
steve
Posted: Thu Aug 02, 2007 7:52 am
by evans036
i have logged a case with IBM on this.
i also note that the 'external target' stage does not parallelize either
i will keep you all posted
thanks,
steve
Posted: Thu Aug 02, 2007 8:24 am
by JoshGeorge
Even if you explicitly specify "Parallel" this operator runs in sequential mode when you try to run any OS commands. A potential bug I think. This topic was covered previously. You can see one of the topics
HERE
Posted: Thu Aug 02, 2007 3:29 pm
by ray.wurlod
It's not necessarily a bug. Some stage types simply don't have parallel execution capability. Others that do have their default execution mode as Sequential, usually for obvious reasons. For example a Sequential File stage with an input link and only one File named can only generate one process - this is an operating system restriction, not a DataStage one.
However, the External Filter stage can operate in parallel execution mode unless there are features of the job design that prevent this. Go to the Advanced tab of the stage properties; in the leftmost frame you can set the execution mode. Whether it then can execute in parallel will again depend on the external command specified. For example cat - (which performs no filtering whatsoever) will happily run in parallel.
Posted: Mon Aug 06, 2007 5:51 am
by evans036
IBM have been looking at this issue and they have been able to recreate in their labs.
so i guess this is a bug
their suggested workaround (which i need to look into) is to use a wrapped stage
if i get further info i will post it here
DS version is 7.5.2
thanks,
steve
Posted: Thu Aug 23, 2007 6:17 am
by evans036
fyi...
IBM have sent me a patch for this problem. i have not had good experiences with dataStage patches with IBM so i might stick to the wrapped stage for now and not apply the patch.
thanks,
steve
Posted: Fri Oct 19, 2007 10:49 am
by ivannavi
ray.wurlod said
For example cat - (which performs no filtering whatsoever) will happily run in parallel.
I read a bunch of *.csv files using External Source stage with cat /path/*.
The link from this is forwarded to a peek stage.
In the monitor in Director Peek shows three nodes (as per configuration file), but the External Source stage runs sequential.
Is this the bug that evans036 mentions a patch being available for?
Or am I doing something wrong?
Posted: Fri Oct 19, 2007 2:02 pm
by evans036
Is this the bug that evans036 mentions a patch being available for?
It might be related. I was specifically using the 'external filter' which i can imagine would have common functionality.
in your case though, parallelism would force your csv file to be read in its entirety once for each parallel stream. You would also need to be sure to deal with dups if the csv is a primary input.
Maybe that's not what u want?
good luck,
steve
Posted: Mon Oct 22, 2007 2:16 am
by ivannavi
I was hoping each node would pick up its own set of files.
An example of a command or a program (what are the prerequisites) that External Source stage expects in order to run in parallel would be great.
If anyone has it working...