why is external filter not running in parallel
Moderators: chulett, rschirm, roy
why is external filter not running in parallel
even after setting the stage->advanced->execution mode = 'parallel' i cannot get my external filter to run in parallel.
is this a known issue/bug?
thanks,
steve
is this a known issue/bug?
thanks,
steve
You are right.
I tried giving an explicit partitioning method (Round Robin). My job aborted with the following message:
By default(Auto) it runs sequentially.
I tried giving an explicit partitioning method (Round Robin). My job aborted with the following message:
Manual specifies that it can run both sequential and parallel.Looks like a bug.Error when checking operator: Input data set on port 0 has a partition method, but the operator is not parallel.
By default(Auto) it runs sequentially.
-
- Participant
- Posts: 612
- Joined: Thu May 03, 2007 4:59 am
- Location: Melbourne
Even if you explicitly specify "Parallel" this operator runs in sequential mode when you try to run any OS commands. A potential bug I think. This topic was covered previously. You can see one of the topics HERE
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It's not necessarily a bug. Some stage types simply don't have parallel execution capability. Others that do have their default execution mode as Sequential, usually for obvious reasons. For example a Sequential File stage with an input link and only one File named can only generate one process - this is an operating system restriction, not a DataStage one.
However, the External Filter stage can operate in parallel execution mode unless there are features of the job design that prevent this. Go to the Advanced tab of the stage properties; in the leftmost frame you can set the execution mode. Whether it then can execute in parallel will again depend on the external command specified. For example cat - (which performs no filtering whatsoever) will happily run in parallel.
However, the External Filter stage can operate in parallel execution mode unless there are features of the job design that prevent this. Go to the Advanced tab of the stage properties; in the leftmost frame you can set the execution mode. Whether it then can execute in parallel will again depend on the external command specified. For example cat - (which performs no filtering whatsoever) will happily run in parallel.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod said
I read a bunch of *.csv files using External Source stage with cat /path/*.
The link from this is forwarded to a peek stage.
In the monitor in Director Peek shows three nodes (as per configuration file), but the External Source stage runs sequential.
Is this the bug that evans036 mentions a patch being available for?
Or am I doing something wrong?
For example cat - (which performs no filtering whatsoever) will happily run in parallel.
I read a bunch of *.csv files using External Source stage with cat /path/*.
The link from this is forwarded to a peek stage.
In the monitor in Director Peek shows three nodes (as per configuration file), but the External Source stage runs sequential.
Is this the bug that evans036 mentions a patch being available for?
Or am I doing something wrong?
It might be related. I was specifically using the 'external filter' which i can imagine would have common functionality.Is this the bug that evans036 mentions a patch being available for?
in your case though, parallelism would force your csv file to be read in its entirety once for each parallel stream. You would also need to be sure to deal with dups if the csv is a primary input.
Maybe that's not what u want?
good luck,
steve