Page 1 of 1

Pattern Matching Lookup

Posted: Wed Nov 23, 2016 3:47 pm
by lna_dsuser
Hi Gurus,

We have a requirement to verify if the data string from the master data set includes the string from the lookup reference dataset.

e.g. If the source data element is 'Made in USA' and the reference data is a list of countries and includes 'USA', it should be a match.

Master Data:

Made in USA
USA
United States of America(USA)
Manufactured in China
Russian
Japanese

Lookup Data:

USA
Russia
India
Canada

Output Data:

Made in USA
USA
United States of America(USA)
Russian

The source and reference data resides in text file(s) so using a database would not be possible.

Thanks

Posted: Wed Nov 23, 2016 4:48 pm
by chulett
Not that this answers your question but... your text files could certainly be made to reside in a database if that would help. :wink:

And just to clarify, you're not looking for a token match (whole words) but rather a substring match? Asking because you have "Russian" as master data matching to "Russia" in the lookup data.

Posted: Thu Nov 24, 2016 9:27 am
by JRodriguez
The main task here will be to generate a constraint to pass the good records. It should be dinamically created to counter in new lookup values. A regular lookup won't work, you will need either count or index function to detect the lookup value existance in the master record. So here is one way that could be approach it

All below steps could be done in a single job or two jobs

1- Read the lookup values and in a transformer concat all the values in a pipe delimited string. Add an artificial key to serve as a join key with the master record, the record coming out of the transformer will look like below and will account for new future values added to the lookup table
"PKey01","USA|Russia|India|Canada"

Now to process the master data and check against the lookup values do:

2- Read your master data, on the fly add a field and set it as ""PKey01", same value that was added to the lookup record
-Join the master data with the lookup record using the field containing the "PKey01" to get both the master and lookup data in same record, something like
"PKey01" + "All Master data fields", "USA|Russia|India|Canada"
- On a transformer, do a loop base on the field that contain the pipe delimited lookup values, and in a loop variable check if the lookup value exist in your master record. Notice that the loop will be taking each value from the pipe delimited string
If Index(MasterRecordValue,PipeDelimitedLookupValue,1) > 1 Then 'Y' Else 'N'

3- Finally, add a constraint to only pass the "Y" records in to your output

Give it a try and get back to us if you encounter issues


Regards