Lookup on variable length values

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mgendy
Premium Member
Premium Member
Posts: 44
Joined: Thu Sep 10, 2009 5:30 am
Contact:

Lookup on variable length values

Post by mgendy »

hi,
i have a new challenge in using lookup stage , i want to do the following

* in a lookup table i have variable length data from 3 to 7 characters length , each character represent an information , my stream data contains a little longer string , which i want to lookup for a substring of it starting with the maximum subsctring length which is 7 , if found data for it in lookup table , Continue if it didn't found the data , then take substring of 6 and then go , the following example may illustrate

Lookup data key
0011
0012468
00115
001154
00116

input data
001246899778 ->> i need 0012468 directly from the lookup table

001157894545 ->> here i need 00115 because 0011578 & 001157 are not exist in the table

001174564564 ->> here i need 001 because 0011745 & 001174 & 00117 and 0011 don't exist in the lookup table


kindly help me to find a solution for this issue using lookup stage or any other aporpirate stage , any feedback will be appreciated

Gendy
Mohmmed Elgendy
Senior System Analyst
Data IntegrationTeam
Etisalat Egypt
+20 1118511161
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As far as I know, you'll need cascading lookups - five of them set to continue if they fail. First one checks for a substring of 7, the next 6, etc.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mgendy
Premium Member
Premium Member
Posts: 44
Joined: Thu Sep 10, 2009 5:30 am
Contact:

Post by mgendy »

chulett wrote:As far as I know, you'll need cascading lookups - five of them set to continue if they fail. First one checks for a substring of 7, the next 6, etc. ...
Many Thanks chulett for your reply and interest , i've thinked in that but is there is any other method to make it dynamicaly , i'm afraid of any changes happens for the lookup digits in the future , may be they increasd to 8 or 9 ,,,
i really apreciate your support chulett
Mohmmed Elgendy
Senior System Analyst
Data IntegrationTeam
Etisalat Egypt
+20 1118511161
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hmmm... perhaps match based on three, allow multiple values to be returned from the lookup, order by length desc and take the first match per group? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Or potentially a sparse lookup joining on a substring of the lookup columns length...
John Smith
Charter Member
Charter Member
Posts: 193
Joined: Tue Sep 05, 2006 8:01 pm
Location: Australia

Post by John Smith »

mgendy wrote:
chulett wrote:As far as I know, you'll need cascading lookups - five of them set to continue if they fail. First one checks for a substring of 7, the next 6, etc. ...
Many Thanks chulett for your reply and interest , i've thinked in that but is there is any other method to make it dynamicaly , i'm afraid of any changes happens for the lookup digits in the future , may be they increasd to 8 or 9 ,,,
i really apreciate your support chulett
Well if it increase to 8 or 9 change your jobs to match. This is where change management is important. That represents a change to the business rules and should be managed accordingly. It's the same with any other job design;you can only build what you know now. Trying to speculate about the future and trying to make your job bullet proof will take tremendous effort and perhaps not worth the trouble.
Sorry if I can't be more helpful.
mgendy
Premium Member
Premium Member
Posts: 44
Joined: Thu Sep 10, 2009 5:30 am
Contact:

Post by mgendy »

chulett wrote:Hmmm... perhaps match based on three, allow multiple values to be returned from the lookup, order by length desc and take the first match per group? :? ...
that is great idea , but unfortunately it won't work for me , because when takingthe lookup on substring of 3,it will give many results which som of them are false incase of same length, let me sho you

lookup values
001
001234
001451
001982

if i took 001 , and then the greater value then it i will give wrong values

it seem that i'll use cascaded lookups ,, many thanks for your greate support
Mohmmed Elgendy
Senior System Analyst
Data IntegrationTeam
Etisalat Egypt
+20 1118511161
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You could try incorporating some kind of substring check based on result length and key value in the transformer to weed out the false positives... maybe. Sorta. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

You can do in several ways.

Simplest will be to do something as below

Code: Select all

SELECT max(lpad(matchAreaCode, maxFieldLength,'0'))
FROM    refTable
WHERE  
matchAreaCode = substr(sourceAreaCode, 1, length(matchAreaCode))
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please make a post telling us, and future searchers, HOW you resolved this issue.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply