why is external filter not running in parallel

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
evans036
Premium Member
Premium Member
Posts: 72
Joined: Tue Jan 31, 2006 11:13 pm

why is external filter not running in parallel

Post by evans036 »

even after setting the stage->advanced->execution mode = 'parallel' i cannot get my external filter to run in parallel.

is this a known issue/bug?

thanks,

steve
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

You are right.

I tried giving an explicit partitioning method (Round Robin). My job aborted with the following message:
Error when checking operator: Input data set on port 0 has a partition method, but the operator is not parallel.
Manual specifies that it can run both sequential and parallel.Looks like a bug.

By default(Auto) it runs sequentially.
evans036
Premium Member
Premium Member
Posts: 72
Joined: Tue Jan 31, 2006 11:13 pm

Post by evans036 »

thanks balajisr for the info.

is anyone using a version of datastage where the external filter does run in parallel?

i'm on v7.5.2

thanks,

steve
evans036
Premium Member
Premium Member
Posts: 72
Joined: Tue Jan 31, 2006 11:13 pm

Post by evans036 »

i have logged a case with IBM on this.

i also note that the 'external target' stage does not parallelize either

i will keep you all posted

thanks,
steve
JoshGeorge
Participant
Posts: 612
Joined: Thu May 03, 2007 4:59 am
Location: Melbourne

Post by JoshGeorge »

Even if you explicitly specify "Parallel" this operator runs in sequential mode when you try to run any OS commands. A potential bug I think. This topic was covered previously. You can see one of the topics HERE
Joshy George
<a href="http://www.linkedin.com/in/joshygeorge1" ><img src="http://www.linkedin.com/img/webpromo/bt ... _80x15.gif" width="80" height="15" border="0"></a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's not necessarily a bug. Some stage types simply don't have parallel execution capability. Others that do have their default execution mode as Sequential, usually for obvious reasons. For example a Sequential File stage with an input link and only one File named can only generate one process - this is an operating system restriction, not a DataStage one.

However, the External Filter stage can operate in parallel execution mode unless there are features of the job design that prevent this. Go to the Advanced tab of the stage properties; in the leftmost frame you can set the execution mode. Whether it then can execute in parallel will again depend on the external command specified. For example cat - (which performs no filtering whatsoever) will happily run in parallel.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
evans036
Premium Member
Premium Member
Posts: 72
Joined: Tue Jan 31, 2006 11:13 pm

Post by evans036 »

IBM have been looking at this issue and they have been able to recreate in their labs.

so i guess this is a bug

their suggested workaround (which i need to look into) is to use a wrapped stage

if i get further info i will post it here

DS version is 7.5.2

thanks,

steve
evans036
Premium Member
Premium Member
Posts: 72
Joined: Tue Jan 31, 2006 11:13 pm

Post by evans036 »

fyi...

IBM have sent me a patch for this problem. i have not had good experiences with dataStage patches with IBM so i might stick to the wrapped stage for now and not apply the patch.

thanks,

steve
ivannavi
Premium Member
Premium Member
Posts: 120
Joined: Mon Mar 07, 2005 9:49 am
Location: Croatia

Post by ivannavi »

ray.wurlod said
For example cat - (which performs no filtering whatsoever) will happily run in parallel.


I read a bunch of *.csv files using External Source stage with cat /path/*.
The link from this is forwarded to a peek stage.
In the monitor in Director Peek shows three nodes (as per configuration file), but the External Source stage runs sequential.
Is this the bug that evans036 mentions a patch being available for?
Or am I doing something wrong?
evans036
Premium Member
Premium Member
Posts: 72
Joined: Tue Jan 31, 2006 11:13 pm

Post by evans036 »

Is this the bug that evans036 mentions a patch being available for?
It might be related. I was specifically using the 'external filter' which i can imagine would have common functionality.

in your case though, parallelism would force your csv file to be read in its entirety once for each parallel stream. You would also need to be sure to deal with dups if the csv is a primary input.

Maybe that's not what u want?

good luck,
steve
ivannavi
Premium Member
Premium Member
Posts: 120
Joined: Mon Mar 07, 2005 9:49 am
Location: Croatia

Post by ivannavi »

I was hoping each node would pick up its own set of files.
An example of a command or a program (what are the prerequisites) that External Source stage expects in order to run in parallel would be great.
If anyone has it working...
Post Reply