Hi All,
I have a requirement to count the number of occurance of a substring in a string. I am aware of the count function which does this for us.
The issue is that the substring values are 8 and I have to use this function 8 times... with OR condition between them. This is fine also.
But these 8 substring values are in a sequential file. I have to access these values from there. We may have a requirement of these values extending to more than 8 in future.
The input file is a dataset.
Any help on this.
Thanks
Check for Substring in a string - substring in reference fil
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 247
- Joined: Thu Apr 27, 2006 6:38 am
- Location: Hyderabad
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 247
- Joined: Thu Apr 27, 2006 6:38 am
- Location: Hyderabad
Thanks Ray.
I am not sure if you understand the question:
Reference file contains: ACCT,ACCOUNT,TRUST
Input records are
BANK ACCOUNT
CREDIT ACCT
TRUST ACCT
HAPPY WORLD
Now we need to check the occurance of the words in the reference to the input file...
Output
Column name, Occurance
BANK ACCOUNT, 1
CREDIT ACCT,1
TRUST ACCT,1
HAPPY WORLD,0
How to implement the job by reading the reference file contents.
First issue is how to read the file and use the Index() function if I have to use it...
I am not sure if you understand the question:
Reference file contains: ACCT,ACCOUNT,TRUST
Input records are
BANK ACCOUNT
CREDIT ACCT
TRUST ACCT
HAPPY WORLD
Now we need to check the occurance of the words in the reference to the input file...
Output
Column name, Occurance
BANK ACCOUNT, 1
CREDIT ACCT,1
TRUST ACCT,1
HAPPY WORLD,0
How to implement the job by reading the reference file contents.
First issue is how to read the file and use the Index() function if I have to use it...
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Code: Select all
Index(RefLink.ColName, "TRUST", 1) > 0 Or Index(RefLink.ColName, "ACCOUNT", 1) > 0 Or Index(RefLink.ColName, "ACCT", 1) > 0
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Bit of crude logic on top of my mind.....
1) Just create your reference file as each word in a separate row with same key value asume it to be 1 so that your file wud look like:-
1, ACCT
1, ACCOUNTANT
1, TRUST
Even if you have a file with single row you can do such kind of transformation in the job as well.
2) In the input link add the field with key value 1 so that you can join with each record of refence file using this key value
3) After the join you will get the folowing output:-
key, Input Field, Reference field
1, BANK ACCOUNT, ACCT
1, BANK ACCOUNT, ACCOUNTANT
1, BANK ACCOUNT, TRUST
1, CREDIT ACCT, ACCT
1, CREDIT ACCT, ACCOUNTANT
and so on
3) Just use a tranformer to count the staring in Reference field in the Input Field data using Count() function.
4) Put an aggregator after the trasnsformer to group the data on Key field and Input field data and sum the count values.
Although it seems a bit crude logic as the number of records would become very large at the input to the transformer.
In this logic you can in future add all the values in refernce files.
Will try to think of some other logic as well since I dont have any work to do at my client location
--------
dsusr
1) Just create your reference file as each word in a separate row with same key value asume it to be 1 so that your file wud look like:-
1, ACCT
1, ACCOUNTANT
1, TRUST
Even if you have a file with single row you can do such kind of transformation in the job as well.
2) In the input link add the field with key value 1 so that you can join with each record of refence file using this key value
3) After the join you will get the folowing output:-
key, Input Field, Reference field
1, BANK ACCOUNT, ACCT
1, BANK ACCOUNT, ACCOUNTANT
1, BANK ACCOUNT, TRUST
1, CREDIT ACCT, ACCT
1, CREDIT ACCT, ACCOUNTANT
and so on
3) Just use a tranformer to count the staring in Reference field in the Input Field data using Count() function.
4) Put an aggregator after the trasnsformer to group the data on Key field and Input field data and sum the count values.
Although it seems a bit crude logic as the number of records would become very large at the input to the transformer.
In this logic you can in future add all the values in refernce files.
Will try to think of some other logic as well since I dont have any work to do at my client location
--------
dsusr