Page 1 of 1

Functions in datastage

Posted: Tue Oct 22, 2013 1:17 pm
by Ranjini
How to find 4 consecutive repetition of same alphabet in a string.Where we are not sure about the alphabet it can be anything from A-Z

Posted: Tue Oct 22, 2013 1:23 pm
by ray.wurlod
Welcome aboard.

Do you have knowledge of the "alphabet" at job run time? If so you can make it a job parameter and use an Index() function to search for four contiguous occurrences of that value.

Code: Select all

Index(InLink.TheColumn, "#jpLetter##jpLetter##jpLetter##jpLetter#", 1)
will be zero if not found or some non-zero value (location in the column) if found.

Posted: Tue Oct 22, 2013 3:21 pm
by Developer9

Code: Select all

Index(InLink.TheColumn,"AAAA",1)
Above Expression gives the position as 2 .

This record can be captured as reject reject record if we check for occurrences whether Zero's or not (All Non-Zero values)

Let me be clear with the requirement.
If the column from the file has data NSAAAAI then the record is rejected since there is 4 consecutive A's in it.

Posted: Tue Oct 22, 2013 5:24 pm
by ray.wurlod
Developer9 wrote:Above Expression gives the position as 2 .
Depends what's in InLink.TheColumn - which you did not indicate before making this assertion.
Developer9 wrote:Let me be clear with the requirement.
If the column from the file has data NSAAAAI then the record is rejected since there is 4 consecutive A's in it.
I did not read anything in the original requirement about rejecting records - only about determining whether there are four consecutive occurrences of the same "alphabet" (which I took to mean "alphabetic character").

Posted: Tue Oct 22, 2013 5:34 pm
by chulett
Index isn't going to be helpful here, unless perhaps you're willing to execute it 26 times. Seems to me you'll need something more like a Regular Expression to detect the presence of four contiguous occurances of the same letter in your data.

Posted: Tue Oct 22, 2013 7:50 pm
by ray.wurlod
chulett wrote:Seems to me you'll need something more like a Regular Expression to detect the presence of four contiguous occurances of the same letter in your data.
This is do-able in DataStage if you have the Data Rules stage, which implies version 8.7FP1 or later and an Information Analyzer licence. One of the possible tests for this stage is whether or not the data matches a regular expression (matches_regex).

Otherwise you could create a BuildOp or leverage the Java capabilities of DataStage to test the regular expression.

Yet another possibility would be to use grep in an External Filter stage.

Posted: Wed Oct 23, 2013 6:32 am
by priyadarshikunal
I remember regular expressions functionality in filter stage in some 8.x version onwards. Can't that be used?

Posted: Wed Oct 23, 2013 8:55 am
by Developer9
I was able to check the position of A's with my expression ..may be I need little more research before responding :?
Thanks