Page 1 of 1

Check for Substring in a string - substring in reference fil

Posted: Fri Nov 23, 2007 12:44 am
by devidotcom
Hi All,

I have a requirement to count the number of occurance of a substring in a string. I am aware of the count function which does this for us.

The issue is that the substring values are 8 and I have to use this function 8 times... with OR condition between them. This is fine also.

But these 8 substring values are in a sequential file. I have to access these values from there. We may have a requirement of these values extending to more than 8 in future.

The input file is a dataset.

Any help on this.

Thanks

Posted: Fri Nov 23, 2007 3:04 am
by ray.wurlod
The best function is Index(). Its task is to search for (the nth occurrence of) a substring within a string.

Posted: Fri Nov 23, 2007 3:23 am
by devidotcom
Thanks Ray.

I am not sure if you understand the question:

Reference file contains: ACCT,ACCOUNT,TRUST

Input records are
BANK ACCOUNT
CREDIT ACCT
TRUST ACCT
HAPPY WORLD


Now we need to check the occurance of the words in the reference to the input file...

Output
Column name, Occurance
BANK ACCOUNT, 1
CREDIT ACCT,1
TRUST ACCT,1
HAPPY WORLD,0
How to implement the job by reading the reference file contents.
First issue is how to read the file and use the Index() function if I have to use it...

Posted: Fri Nov 23, 2007 4:29 am
by ray.wurlod

Code: Select all

Index(RefLink.ColName, "TRUST", 1) > 0 Or Index(RefLink.ColName, "ACCOUNT", 1) > 0 Or Index(RefLink.ColName, "ACCT", 1) > 0 
Put the most likely one first and the least likely one last, for maximum efficiency.

Posted: Fri Nov 23, 2007 5:24 am
by dsusr
Bit of crude logic on top of my mind.....

1) Just create your reference file as each word in a separate row with same key value asume it to be 1 so that your file wud look like:-

1, ACCT
1, ACCOUNTANT
1, TRUST

Even if you have a file with single row you can do such kind of transformation in the job as well.

2) In the input link add the field with key value 1 so that you can join with each record of refence file using this key value

3) After the join you will get the folowing output:-

key, Input Field, Reference field
1, BANK ACCOUNT, ACCT
1, BANK ACCOUNT, ACCOUNTANT
1, BANK ACCOUNT, TRUST
1, CREDIT ACCT, ACCT
1, CREDIT ACCT, ACCOUNTANT
and so on

3) Just use a tranformer to count the staring in Reference field in the Input Field data using Count() function.

4) Put an aggregator after the trasnsformer to group the data on Key field and Input field data and sum the count values.

Although it seems a bit crude logic as the number of records would become very large at the input to the transformer.

In this logic you can in future add all the values in refernce files.

Will try to think of some other logic as well since I dont have any work to do at my client location ;)

--------
dsusr