Pattern Matching Lookup

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
lna_dsuser
Premium Member
Premium Member
Posts: 29
Joined: Tue Aug 14, 2012 1:06 pm

Pattern Matching Lookup

Post by lna_dsuser »

Hi Gurus,

We have a requirement to verify if the data string from the master data set includes the string from the lookup reference dataset.

e.g. If the source data element is 'Made in USA' and the reference data is a list of countries and includes 'USA', it should be a match.

Master Data:

Made in USA
USA
United States of America(USA)
Manufactured in China
Russian
Japanese

Lookup Data:

USA
Russia
India
Canada

Output Data:

Made in USA
USA
United States of America(USA)
Russian

The source and reference data resides in text file(s) so using a database would not be possible.

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not that this answers your question but... your text files could certainly be made to reside in a database if that would help. :wink:

And just to clarify, you're not looking for a token match (whole words) but rather a substring match? Asking because you have "Russian" as master data matching to "Russia" in the lookup data.
-craig

"You can never have too many knives" -- Logan Nine Fingers
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

The main task here will be to generate a constraint to pass the good records. It should be dinamically created to counter in new lookup values. A regular lookup won't work, you will need either count or index function to detect the lookup value existance in the master record. So here is one way that could be approach it

All below steps could be done in a single job or two jobs

1- Read the lookup values and in a transformer concat all the values in a pipe delimited string. Add an artificial key to serve as a join key with the master record, the record coming out of the transformer will look like below and will account for new future values added to the lookup table
"PKey01","USA|Russia|India|Canada"

Now to process the master data and check against the lookup values do:

2- Read your master data, on the fly add a field and set it as ""PKey01", same value that was added to the lookup record
-Join the master data with the lookup record using the field containing the "PKey01" to get both the master and lookup data in same record, something like
"PKey01" + "All Master data fields", "USA|Russia|India|Canada"
- On a transformer, do a loop base on the field that contain the pipe delimited lookup values, and in a loop variable check if the lookup value exist in your master record. Notice that the loop will be taking each value from the pipe delimited string
If Index(MasterRecordValue,PipeDelimitedLookupValue,1) > 1 Then 'Y' Else 'N'

3- Finally, add a constraint to only pass the "Y" records in to your output

Give it a try and get back to us if you encounter issues


Regards
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
Post Reply