Check for Substring in a string - substring in reference fil

devidotcom · Post by **devidotcom** » Fri Nov 23, 2007 12:44 am

Hi All,

I have a requirement to count the number of occurance of a substring in a string. I am aware of the count function which does this for us.

The issue is that the substring values are 8 and I have to use this function 8 times... with OR condition between them. This is fine also.

But these 8 substring values are in a sequential file. I have to access these values from there. We may have a requirement of these values extending to more than 8 in future.

The input file is a dataset.

Any help on this.

Thanks

ray.wurlod · Post by **ray.wurlod** » Fri Nov 23, 2007 3:04 am

The best function is Index(). Its task is to search for (the nth occurrence of) a substring within a string.

devidotcom · Post by **devidotcom** » Fri Nov 23, 2007 3:23 am

Thanks Ray.

I am not sure if you understand the question:

Reference file contains: ACCT,ACCOUNT,TRUST

Input records are
BANK ACCOUNT
CREDIT ACCT
TRUST ACCT
HAPPY WORLD

Now we need to check the occurance of the words in the reference to the input file...

Output
Column name, Occurance
BANK ACCOUNT, 1
CREDIT ACCT,1
TRUST ACCT,1
HAPPY WORLD,0
How to implement the job by reading the reference file contents.
First issue is how to read the file and use the Index() function if I have to use it...

ray.wurlod · Post by **ray.wurlod** » Fri Nov 23, 2007 4:29 am

Code: Select all

Index(RefLink.ColName, "TRUST", 1) > 0 Or Index(RefLink.ColName, "ACCOUNT", 1) > 0 Or Index(RefLink.ColName, "ACCT", 1) > 0

Put the most likely one first and the least likely one last, for maximum efficiency.

dsusr · Post by **dsusr** » Fri Nov 23, 2007 5:24 am

Bit of crude logic on top of my mind.....

1) Just create your reference file as each word in a separate row with same key value asume it to be 1 so that your file wud look like:-

1, ACCT
1, ACCOUNTANT
1, TRUST

Even if you have a file with single row you can do such kind of transformation in the job as well.

2) In the input link add the field with key value 1 so that you can join with each record of refence file using this key value

3) After the join you will get the folowing output:-

key, Input Field, Reference field
1, BANK ACCOUNT, ACCT
1, BANK ACCOUNT, ACCOUNTANT
1, BANK ACCOUNT, TRUST
1, CREDIT ACCT, ACCT
1, CREDIT ACCT, ACCOUNTANT
and so on

3) Just use a tranformer to count the staring in Reference field in the Input Field data using Count() function.

4) Put an aggregator after the trasnsformer to group the data on Key field and Input field data and sum the count values.

Although it seems a bit crude logic as the number of records would become very large at the input to the transformer.

In this logic you can in future add all the values in refernce files.

Will try to think of some other logic as well since I dont have any work to do at my client location

--------
dsusr