Page 1 of 1

How to build an external filter stage

Posted: Fri Oct 06, 2006 4:00 am
by sidharth
Hi All,
I am making an attempt to understand how to use a external filter stage. I know the below narrated one is a simple situation and can be handled with a normal filter stage. For simplicity, i have assumed the below situation for the study. Request your assistance in doing it...

Code: Select all

Input 
----- 
id    val 
--    --- 
1   A 
2   B 
3   C 
4   A 
5   B 
6   A 
7   A 
8   C 
9   C 

I want to pass the records with val="A" alone 

Output 
------ 

id    val 
--    --- 
1   A 
4   A 
6   A 
7   A 

Code: Select all

> My DS Job   Seq File ----> External Filter Stage -----> Peek 

> I have planned to use a shell script(grepcmd.sh) as a filter command in the External Filter Stage. 

repcmd.sh => cat $1 | grep "A" 

> In the  External Filter Stage, following are the optios set, 
   Filter command = grepcmd.sh 
   Arguments = val 
# Should i need to pass all the columns as argumnets to the shell script(stdin), even though the filter critera is on a single column(val) ?
# Should i need to output(stdout) all the columns from the shell script ?

Code: Select all

With the above setup, i get the following error, 
External_Filter_1,0: cat: val: The system cannot find the file specified. 
External_Filter_1,0: Wrapped Unix command 'grepcmd.sh val' terminated with error, exit code 1.External_Filter_1 
External_Filter_1,0: subprocess failed with exit code 1,External_Filter_1 
External_Filter_1,0: Operator's runLocally() failed. 
I know i am missing things some where, pls help me in completing my understanding.

Bunch of thanks ,

-Sid

Posted: Fri Oct 06, 2006 6:29 am
by ray.wurlod
Somewhere in there you're going to need some reference to stdin. For example cat - | grep 'A'. However, obviously this grep won't do it in a generic sense; you need some way to identify just the 'A' in your val column; perhaps you need to pass the column number or some other means of recognizing the val column, and use a slightly more complex command.
For the example cited, grep 'A' would work, since it is the only 'A' in the row.

Posted: Fri Oct 06, 2006 6:33 am
by kumar_s
If $1 is the column passed, try with

Code: Select all

echo $1 | grep "A" or grep "A" $1

Posted: Thu Nov 02, 2006 11:36 am
by splayer
I am trying to reproduce this job. I created the sequential file without the headers. I also added grep as the Filter Command for the External_Filter stage. But I am not clear as to what should go into grepcmd.sh and what should the arguments for the External Filter stage be?

I would appreciate any help.