How to build an external filter stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sidharth
Participant
Posts: 19
Joined: Mon Nov 07, 2005 1:47 am

How to build an external filter stage

Post by sidharth »

Hi All,
I am making an attempt to understand how to use a external filter stage. I know the below narrated one is a simple situation and can be handled with a normal filter stage. For simplicity, i have assumed the below situation for the study. Request your assistance in doing it...

Code: Select all

Input 
----- 
id    val 
--    --- 
1   A 
2   B 
3   C 
4   A 
5   B 
6   A 
7   A 
8   C 
9   C 

I want to pass the records with val="A" alone 

Output 
------ 

id    val 
--    --- 
1   A 
4   A 
6   A 
7   A 

Code: Select all

> My DS Job   Seq File ----> External Filter Stage -----> Peek 

> I have planned to use a shell script(grepcmd.sh) as a filter command in the External Filter Stage. 

repcmd.sh => cat $1 | grep "A" 

> In the  External Filter Stage, following are the optios set, 
   Filter command = grepcmd.sh 
   Arguments = val 
# Should i need to pass all the columns as argumnets to the shell script(stdin), even though the filter critera is on a single column(val) ?
# Should i need to output(stdout) all the columns from the shell script ?

Code: Select all

With the above setup, i get the following error, 
External_Filter_1,0: cat: val: The system cannot find the file specified. 
External_Filter_1,0: Wrapped Unix command 'grepcmd.sh val' terminated with error, exit code 1.External_Filter_1 
External_Filter_1,0: subprocess failed with exit code 1,External_Filter_1 
External_Filter_1,0: Operator's runLocally() failed. 
I know i am missing things some where, pls help me in completing my understanding.

Bunch of thanks ,

-Sid
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Somewhere in there you're going to need some reference to stdin. For example cat - | grep 'A'. However, obviously this grep won't do it in a generic sense; you need some way to identify just the 'A' in your val column; perhaps you need to pass the column number or some other means of recognizing the val column, and use a slightly more complex command.
For the example cited, grep 'A' would work, since it is the only 'A' in the row.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

If $1 is the column passed, try with

Code: Select all

echo $1 | grep "A" or grep "A" $1
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

I am trying to reproduce this job. I created the sequential file without the headers. I also added grep as the Filter Command for the External_Filter stage. But I am not clear as to what should go into grepcmd.sh and what should the arguments for the External Filter stage be?

I would appreciate any help.
Post Reply