Need help with an External Filter problem

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Bob Kendall
Participant
Posts: 6
Joined: Wed Jul 13, 2005 5:11 pm
Location: El Dorado Hills, CA

Need help with an External Filter problem

Post by Bob Kendall »

I've written a unix C program to be used as an external filter. The columns it returns are different than the columns fed to it (and many more rows are produced than are supplied to the filter).

When I run the DS job, the external filter is called three times (there are three input records) and, based on debug msgs I've written to stderr that show up in the director log, I'm getting the output I want. But the Data Set output stage that follows the filter is failing because it can't find one of the columns I supplied as input to the filter. I'm not outputting this column and it isn't in the output column list for the filter and the Runtime Column Propogation check box for the filter stage is not checked.

On a side note, the column being complained about is also one that I didn't want in my input to the filter, but DS insisted on supplying it. I finally gave up and accepted in on input, even though I don't use it.

Another side note - I'm processing all three input records before the error shows up. I can't tell if (a) the system waits until I've processed all the records before complaining about the first one, (b) only the last record has a problem or (c) if the system is complaining about something after the last record I put out.

As near as I can tell, I'm not writing anything extraneous at the end of the run. But even if I were, why would the stage complain about missing a field that I never told it should be there in the first place?

I'm pretty new at DS. Does anyone have any suggestions as to how to attack this problem?

(I previously posted this question in the server forum, but just realized that I should've posted it in the EE forum)

Thanks,
Bob Kendall
pnchowdary
Participant
Posts: 232
Joined: Sat May 07, 2005 2:49 pm
Location: USA

Post by pnchowdary »

Hi Bob,
On a side note, the column being complained about is also one that I didn't want in my input to the filter, but DS insisted on supplying it. I finally gave up and accepted in on input, even though I don't use it.
If you dont want any particular column that you are reading from the input, you can use a transformer stage. In this transformer stage pass all the rest of the columns except the column you dont want to pass.

Hope that helps.
Thanks,
Naveen
Bob Kendall
Participant
Posts: 6
Joined: Wed Jul 13, 2005 5:11 pm
Location: El Dorado Hills, CA

Post by Bob Kendall »

Naveen -

Thanks for the response.

Do I imply from what you say that the output from an Extended Filter must match the input to the filter?

Thanks,
Bob
Eric
Participant
Posts: 254
Joined: Mon Sep 29, 2003 4:35 am

Post by Eric »

The output schema of the external filter does not need to match the input schema to the filter.

You could try?
If you change the DataSet design schema you should ensure any previous datasets with the same name are deleted before running the job again (Use Manager -> Tools -> Data Set management)

Check Runtime Column Propogation is off for the project, re-compile and run again

In the Designer select "show stage validation errors" and see if any errors appear.
Bob Kendall
Participant
Posts: 6
Joined: Wed Jul 13, 2005 5:11 pm
Location: El Dorado Hills, CA

Post by Bob Kendall »

I tried building a simple case of the problem to be able to post more information here - but of course the simple case worked just fine. It was only after I threw in the lookup and and transformer stage preceeding the External Filter that things started to fail.

Here are some of the 'clues' that I find confusing that maybe someone else can relate to:
I'm returning fewer characters on the output of the External Filter than I am on the input. I have the output declared as Varchar, and the string I'm returning is terminated with a \n. However DS still complains with:
"External_Filter_1,0: On output connection 1 (for connection 1) at record 0: Field "IF_02_STMT_COMP" with 'delim=end' did not consume entire input"
I'm confused by the reference to 'output connection 1'. I thought all connection numbering started at zero, and since th external filter only allows one output I would think that should be zero.

The second warning is
External_Filter_1,0: On output connection 1 (for connection 1) at record 0: Input buffer overrun at field "IF_02_STMT_COMP"
(IF_02_STMT_COMP is the last field on the input and output. I'm returning identical field values for all fields except the last one, which is totally different on the the output than on the input.)

That's about all I can offer right now...

Thanks,
Bob
Post Reply