Page 1 of 1

filter with regular expression

Posted: Tue Feb 13, 2007 7:39 am
by PeterPol
Hello,

I tried to filter records from a table using the following regular expression:

idnummer like '[P-Q][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'

(to get the idnummers that begin with P or Q directly followed by 7 digits).

However, when I run the job I get the following error:
Filter_3: Literal regular expression formatted incorrectly. Got: '[P-Q][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'.

idnummer has type varchar(8)

The strange thing is that when I trim the last [0-9] the expression seems valid: the job runs without error (although this is not what I want...)

Any ideas what goed wrong?

Peter

Posted: Tue Feb 13, 2007 2:22 pm
by ray.wurlod
It looks fine - as a regular expression - to me. Perhaps the Filter stage has an undocumented limit on the length of the regular expression. You ought to ask your support provider. Meanwhile can someone else with access to parallel jobs try to reproduce Peter's issue?

Posted: Wed Feb 14, 2007 2:00 am
by PeterPol
Hello,

Can it be that the filter stage does not support the use of brackets in expressions?

In the meantime I was experimenting with external filter stage (using awk) and I found it very useful for my purposes.
I want to build a job that checks if a field in a given dataset matches a certain (user given) pattern, like string starts with ZZ and is 10 long, or string contains only 'A' or 'B', or string is fully in lowercase, etc .

By using the external filter stage with Awk I was able to do this kind of filtering. However, what I actually want are the lines that do NOT match the given awk pattern.

suppose I have the following filter: "/^[Aa]b/"
How do I filter all fields that do not match this pattern?

Unfortunately, the external filter stage has no reject link...

Peter

Posted: Wed Feb 14, 2007 2:17 am
by ray.wurlod
So what you're looking for is an awk expert?

Can you filter it (in the External Filter stage) with grep, which may be more efficient, and in which it's easier to filter the logical inverse (that means using the -v option in the grep command).