Cleanse/Confirm Question ?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

Hi Kaps,

I believe this is what you are looking for:

Issue:
Do multiple simultaneous lookups on a single translation table. Multiple source columns need to be mapped to the same reference column and the translated(standard) value needs to be passed to target.

The job design can be:

Code: Select all


DB2 Src for rule table--->Hashed File(Rule Table loaded into HF)                        
                               |(Lkp1)|(Lkp2)|(Lkp3)
                               |      |      |
                               |      |      |
Source------------------------>Transformer    -------------->Target Table or File. 


LKP1: Lookup for Gender
LKP2: Lookup for State
LKP3: Lookup for Cntry_Code

Make sure the "Preload to Memory" option at the output of the HashedFile stage is enabled

The lookup logic is as follows:
Primary input to ref transformer:
Source_ID
Gender
State
Cntry_Code

Lkp1:
Source_ID [Key Yes] <--(Key Derivation)-- Primary.Source_ID
Source_Value [Key Yes] <--(Key Derivation)-- Primary.Gender
Cleansed_Value [key No] (Map to Target as Gender)

Lkp2:
Source_ID [Key Yes] <--(Key Derivation)-- Primary.Source_ID
Source_Value [Key Yes] <--(Key Derivation)-- Primary.State
Cleansed_Value [key No] (Map to Target as State)

Lkp3:
Source_ID [Key Yes] <--(Key Derivation)-- Primary.Source_ID
Source_Value [Key Yes] <--(Key Derivation)-- Primary.Cntry_Code
Cleansed_Value [key No] (Map to Target as Cntry_Code)

Output of transformer:
Source_ID<--(Derivation)--Primary.Source_ID
Gender<--(Derivation)--Lkp1.Cleansed_Value
State<--(Derivation)--Lkp2.Cleansed_Value
Cntry_Code<--(Derivation)--Lkp3.Cleansed_Value

IHTH
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

Amey Vaidya,

That is not what kaps is looking for. And the previous posts make this clear. The suggestion made by gateleys is also the same.

Kaps doesn't want to do multiple look-ups.

Kaps,

I was thinking about it and I kind of used stage variables in my logic and no hashed file. But I couldn't run the job as I had problems with getting the database up on my machine. So, not 100% percent sure if it works. But you can give it a shot.

However, I have created a pdf of the design which basically shows the logic inside the transformer of the job. In this design, DSLink4 is your source link, DSLink5 is your Rules Table Link (remember no hashed file used, looking up the database with Source_ID as the key) and DSLink6 is your target link.

You can download the pdf file from here:

pdf

If you try it out. Please be kind enough to notify me about the outcome.

Thanks,
Naveen.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

naveendronavalli wrote:Amey Vaidya,

That is not what kaps is looking for. And the previous posts make this clear. The suggestion made by gateleys is also the same.

Kaps doesn't want to do multiple look-ups.
Sorry Naveen; But I disagree!
gateleys wrote:A simple way would be to have 3 hashed files, each with two columns, both of which are keys. example hfile1(SourceID, Gender), hfile2(SourceID, State), hfile3(SourceID, Cntry_Code). Next, do a lookup of your source rows against these hashed files.
kaps wrote:thats just an example I was describing...I can't create a table for every field which need to be cleansed.
If you see my job design, I need only the main lookup table dumped to a hashed file. There is no need of splitting into 3 hashed files as gateleys has recommended.

Also please note that I've used a single Hashed file stage and have drawn multiple links from it. Atleast from what an Ascential/IBM consultant has told me.. that means that the Hashed File is preloaded Only Once into memory(Gurus ... need confirmation of this theory!!).

I had faced this same requirement.. and this is the solution I'd used and it does work.
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

naveendronavalli wrote:
However, I have created a pdf of the design which basically shows the logic inside the transformer of the job. In this design, DSLink4 is your source link, DSLink5 is your Rules Table Link (remember no hashed file used, looking up the database with Source_ID as the key) and DSLink6 is your target link.
Also If you look at the example kaps has given and link to reference only on Source_id, for a single given input row, there will be multiple rows on the reference link that qualify.

IHTH
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

Amey,

Perfect! Bingo. I couldn't disagree with you this time after I read carefully.

Kaps! Amey's solution is the right way to approach it. You can also try the method suggested by me and compare the performance of both the designs.

Keep it up!

Thanks,
Naveen.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

ameyvaidya wrote:
naveendronavalli wrote:
However, I have created a pdf of the design which basically shows the logic inside the transformer of the job. In this design, DSLink4 is your source link, DSLink5 is your Rules Table Link (remember no hashed file used, looking up the database with Source_ID as the key) and DSLink6 is your target link.
Also If you look at the example kaps has given and link to reference only on Source_id, for a single given input row, there will be multiple rows on the reference link that qualify.

IHTH
But I'm checking for this qualification using the stage variables. I should have the run the job and checked the output. Anyways, Do you think that this design will fail??

Thanks,
Naveen.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Post by kaps »

Thanks Amey and Naveen for your time...

It looks like Amey has given the solution. Let me test it and keep you guys posted.

Thanks
kaps
Participant
Posts: 452
Joined: Tue May 10, 2005 12:36 pm

Post by kaps »

Amey,
Thanks...Your logic works fine. Only thing is I need to create as many links as the number of cleanse fields. But better than having many Hashed file stages.

Naveen,
I have not tested ur solution yet. I will do it and let you know. Meanwhile if u had a chance to test it Please post the result here...

Thanks
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

This is known as Code to Name or Name to code Look-up. :!: :!: :shock:
These translations are defined by the users.
KAPS, you stated the question in a wrong way. This is not called cleanse/confirm. :x
The most effecient and generic way to do this is using Hashed files in Server or Lookup stage in EE.
good luck :)
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

naveendronavalli wrote: But I'm checking for this qualification using the stage variables. I should have the run the job and checked the output. Anyways, Do you think that this design will fail??

Thanks,
Naveen.
Hi Naveen,
The issue with this is that multiple columns per row need to be translated using the reference table. So the unique key for the referernce table is the source_id and the source_value_to_be_translated. A lookup on only the Source_Id will have multiple rows qualifying from the reference table( exactly, one per column to be translated in the source row). That will need to be handled in the transformer. I dont know how this is done, yet. But a Hashed File would return only the first row.

Looking at kaps example:

Code: Select all

Source Record (File) 
-------------------- 
Source ID   Gender      State      Cntry_Code    
----------------------------------------------------------- 
SRC1      Male      Nebraska   United States 


RuleTable(DataBase) 
-------------------- 
Source ID   Source Value      Cleansed Value       
------------------------------------------------------- 
SRC1      Male         M 
SRC1      Nebraska      NE 
SRC1      United States      USA 


Reference Link data for Input row:
Source ID   Source Value      Cleansed Value       
------------------------------------------------------- 
SRC1      Male         M 


Therefore Output:
------ 
Source ID   Gender      State      Cntry_Code    
----------------------------------------------------------- 
SRC1      M      ""<Else Value>      ""<else Value>

Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
ravij
Premium Member
Premium Member
Posts: 170
Joined: Mon Oct 10, 2005 7:04 am
Location: India

Post by ravij »

Hi kaps,
thats just an example I was describing...I can't create a table for every field which need to be cleansed.
You need not create any table individually. Just you have to create 3 hashed files and map them with source file in the Transformer Stage.
Ravi
Post Reply