Page 1 of 1

Date Range Lookups

Posted: Mon Sep 11, 2006 2:52 am
by Andal
Hi All

I want to perform a date range lookup Something like

Code: Select all

Source.key1 = lkp.key1 and
source.key2 = lkp.key2 and
lkp.EFF_START_DT <= Source.EFF_DT and 
lkp.EFF_END_DT >= Source.EFF_DT 
Now we are doing a lookup directly with the database. But this affects the performance very much. we are getting 10+ rows per second only.

And based upon the suggestions from this forum, i tried using Universe Stage also, but it doesnt improve the performance much.

And in some other posts viewtopic.php?t=102858, DSguru2B has mentioned that we can do this with the Hashed File by intorducing a dummy Key Filed and match it with @INROWNUM

Code: Select all

if  NOT(RefLink.NOTFOUND) AND main_id = lookup_id AND ICONV(dt_record,"D/MDY[2,2,4]") >= ICONV(dt_start,"D/MDY[2,2,4]") AND ICONV(dt_record,"D/MDY[2,2,4]") <= ICONV(dt_end,"D/MDY[2,2,4]") then 'P' else 'N' 
My Question is, how this logic will work.
1) If we mention KEY1 and KEY2 alone as key columns in the Hashed file , then the values can be overwritten.

Code: Select all

For Ex.

Key1$Key2$Desc$Eff_Start_Dt$Eff_End_Dt
A$B$TestDesc$01/01/2005$12/31/2005 
A$B$TestDesc1$01/01/2006$12/31/2006
We will have the Second row only in the Hashed File if we make KEY1 and KEY2 as keycolumns.

2) If we make the Date fields also as KeyColumns and if we pass EFF_DT form the source, the lookup wont match because it will try to do a equivalent match.

3) This will be same case, if we introduce a dummy sequential key field in Hashed file and match it with the @INROWNUM.

Please Suggest me how to proceed with this.

Note :

My Source and target is having around 1 million rows.

Posted: Mon Sep 11, 2006 3:07 am
by loveojha2
If we mention KEY1 and KEY2 alone as key columns in the Hashed file , then the values can be overwritten
Instead of Hashed File use Universe Stage with Key1 and Key2 as keys not the date column.

Select multi row lookup as checked.

Once rows are looked up (more than one may be), check for the date condition outside the lookup, within the transformer (may be while writing to the link as link constraint.

Posted: Tue Sep 12, 2006 9:55 pm
by Andal
Hi Love

I have already tried with universe stage and it doesnt improve the performance much.

Posted: Tue Sep 12, 2006 10:40 pm
by DSguru2B
The post that you have referenced, read it carefully. In that you only specify the dummy column (running number) as the key and no other key. This way you will retain each and every row and not lose any. Then you can use my derivation to get the range lookup.

Posted: Tue Sep 12, 2006 10:45 pm
by kumar_s
It is actually populating the records with the intermediate values between the to and from range. But again it will chalange the performace for the 1million row lookup. But is has advantage of equality predicate.

Posted: Wed Sep 13, 2006 1:15 am
by ray.wurlod
You might also contemplate indexing the search fields in the hashed file (the UV table).

Posted: Wed Sep 13, 2006 1:16 am
by Andal
DSguru2B wrote:The post that you have referenced, read it carefully. In that you only specify the dummy column (running number) as the key and no other key. This way you will retain each and every row and not lose any. Then you can use my derivation to get the range lookup.
If we match the @INROWNUM with dummy sequential key, I hope we wont get the desired results. Suppose for my first row from the source KEY1 and KEY2 are "AA" and "BB". And in the Hashed File the dummy keycolumn "1" may contain some other values for original lookup keys, something like "CC" and "DD".

This will fail in

Code: Select all

if  NOT(RefLink.NOTFOUND) AND main_id = lookup_id
Please explain.

Posted: Wed Sep 13, 2006 11:01 am
by DSguru2B
The dummy key in the hashed file will be a running number, like 1,2,3...and so on. Inside the transformer, the expression of the key column will be @INROWNUM. This way each and every record will be looked up (NOTE: No matching with the source is done at this stage)
All the matching is done in the stage variables. In your case you have two keys so the derivation will be

Code: Select all

if  NOT(RefLink.NOTFOUND) AND SRC.KEY1= REF.KEY1 AND SRC.KEY2=REF.KEY2 and so on .....

Posted: Wed Sep 13, 2006 9:31 pm
by Andal
Please forgive me If i am so dumb to understand the logic. Let me explain my doubt with some example

Snapshot of my first 2 rows in Source

Code: Select all

Key1$Key2$EFF_DT
DD$EE$01/01/2005
GG$HH$01/01/2005
Snapshot of my Hashed File

Code: Select all

DummyKey$SRCMATCH1$SRCMATCH2$LKPDESC$EFF_START_DT$EFF_END_DT
1$CC$DD$DESC1$01/01/2005$12/31/2005
2$CC$DD$DESC2$01/01/2006$12/31/2006
3$DD$EE$DESC1$01/01/2005$12/31/2005
4$DD$EE$DESC2$01/01/2006$12/31/2006
I had shown some sample data above. In the Hashed file DummyKey is the only keycolumn. So from the source if the first row comes @INROWNUM will be 1, and if we do a match with dummykey alone it will find a match and it will return the first row.

So if we check something like

Code: Select all

if  NOT(RefLink.NOTFOUND) AND SRC.KEY1= REF.KEY1 AND SRC.KEY2=REF.KEY2
in the matching condition SRC.KEY1 = REF.KEY1 it will fail, but i am having a matching condition in the third row.

Please explain

Posted: Wed Sep 13, 2006 10:44 pm
by DSguru2B
You are getting a matching record in the third row because both the keys match with the first record of your source sample data and the date is in between the start and end date. Isnt that what you want???

How it works is that with @INROWNUM in the expression of the hashed file key (dummy key), each and every row will be looked up, but what row goes across to the target link, will be decided by the stage variable. So basically you are putting the conditions in stage variables and constraining the output, hence twisting the logic to work like a range lookup.

Posted: Wed Sep 13, 2006 11:40 pm
by Andal
Thanks for the Reply, i will check it out by today.