Date Range Lookups

Andal · Post by **Andal** » Mon Sep 11, 2006 2:52 am

Hi All

I want to perform a date range lookup Something like

Source.key1 = lkp.key1 and
source.key2 = lkp.key2 and
lkp.EFF_START_DT <= Source.EFF_DT and 
lkp.EFF_END_DT >= Source.EFF_DT

Now we are doing a lookup directly with the database. But this affects the performance very much. we are getting 10+ rows per second only.

And based upon the suggestions from this forum, i tried using Universe Stage also, but it doesnt improve the performance much.

And in some other posts viewtopic.php?t=102858, DSguru2B has mentioned that we can do this with the Hashed File by intorducing a dummy Key Filed and match it with @INROWNUM

Code: Select all

if  NOT(RefLink.NOTFOUND) AND main_id = lookup_id AND ICONV(dt_record,"D/MDY[2,2,4]") >= ICONV(dt_start,"D/MDY[2,2,4]") AND ICONV(dt_record,"D/MDY[2,2,4]") <= ICONV(dt_end,"D/MDY[2,2,4]") then 'P' else 'N'

My Question is, how this logic will work.
1) If we mention KEY1 and KEY2 alone as key columns in the Hashed file , then the values can be overwritten.

Code: Select all

For Ex.

Key1$Key2$Desc$Eff_Start_Dt$Eff_End_Dt
A$B$TestDesc$01/01/2005$12/31/2005 
A$B$TestDesc1$01/01/2006$12/31/2006

We will have the Second row only in the Hashed File if we make KEY1 and KEY2 as keycolumns.

2) If we make the Date fields also as KeyColumns and if we pass EFF_DT form the source, the lookup wont match because it will try to do a equivalent match.

3) This will be same case, if we introduce a dummy sequential key field in Hashed file and match it with the @INROWNUM.

Please Suggest me how to proceed with this.

Note :

My Source and target is having around 1 million rows.

loveojha2 · Post by **loveojha2** » Mon Sep 11, 2006 3:07 am

If we mention KEY1 and KEY2 alone as key columns in the Hashed file , then the values can be overwritten

Instead of Hashed File use Universe Stage with Key1 and Key2 as keys not the date column.

Select multi row lookup as checked.

Once rows are looked up (more than one may be), check for the date condition outside the lookup, within the transformer (may be while writing to the link as link constraint.

Andal · Post by **Andal** » Tue Sep 12, 2006 9:55 pm

Hi Love

I have already tried with universe stage and it doesnt improve the performance much.

DSguru2B · Post by **DSguru2B** » Tue Sep 12, 2006 10:40 pm

The post that you have referenced, read it carefully. In that you only specify the dummy column (running number) as the key and no other key. This way you will retain each and every row and not lose any. Then you can use my derivation to get the range lookup.

kumar_s · Post by **kumar_s** » Tue Sep 12, 2006 10:45 pm

It is actually populating the records with the intermediate values between the to and from range. But again it will chalange the performace for the 1million row lookup. But is has advantage of equality predicate.

ray.wurlod · Post by **ray.wurlod** » Wed Sep 13, 2006 1:15 am

You might also contemplate indexing the search fields in the hashed file (the UV table).

Andal · Post by **Andal** » Wed Sep 13, 2006 1:16 am

DSguru2B wrote:The post that you have referenced, read it carefully. In that you only specify the dummy column (running number) as the key and no other key. This way you will retain each and every row and not lose any. Then you can use my derivation to get the range lookup.

If we match the @INROWNUM with dummy sequential key, I hope we wont get the desired results. Suppose for my first row from the source KEY1 and KEY2 are "AA" and "BB". And in the Hashed File the dummy keycolumn "1" may contain some other values for original lookup keys, something like "CC" and "DD".

This will fail in

Code: Select all

if  NOT(RefLink.NOTFOUND) AND main_id = lookup_id

Please explain.

DSguru2B · Post by **DSguru2B** » Wed Sep 13, 2006 11:01 am

The dummy key in the hashed file will be a running number, like 1,2,3...and so on. Inside the transformer, the expression of the key column will be @INROWNUM. This way each and every record will be looked up (NOTE: No matching with the source is done at this stage)
All the matching is done in the stage variables. In your case you have two keys so the derivation will be

Code: Select all

if  NOT(RefLink.NOTFOUND) AND SRC.KEY1= REF.KEY1 AND SRC.KEY2=REF.KEY2 and so on .....

Andal · Post by **Andal** » Wed Sep 13, 2006 9:31 pm

Please forgive me If i am so dumb to understand the logic. Let me explain my doubt with some example

Snapshot of my first 2 rows in Source

Code: Select all

Key1$Key2$EFF_DT
DD$EE$01/01/2005
GG$HH$01/01/2005

Snapshot of my Hashed File

Code: Select all

DummyKey$SRCMATCH1$SRCMATCH2$LKPDESC$EFF_START_DT$EFF_END_DT
1$CC$DD$DESC1$01/01/2005$12/31/2005
2$CC$DD$DESC2$01/01/2006$12/31/2006
3$DD$EE$DESC1$01/01/2005$12/31/2005
4$DD$EE$DESC2$01/01/2006$12/31/2006

I had shown some sample data above. In the Hashed file DummyKey is the only keycolumn. So from the source if the first row comes @INROWNUM will be 1, and if we do a match with dummykey alone it will find a match and it will return the first row.

So if we check something like

Code: Select all

if  NOT(RefLink.NOTFOUND) AND SRC.KEY1= REF.KEY1 AND SRC.KEY2=REF.KEY2

in the matching condition SRC.KEY1 = REF.KEY1 it will fail, but i am having a matching condition in the third row.

Please explain

DSguru2B · Post by **DSguru2B** » Wed Sep 13, 2006 10:44 pm

You are getting a matching record in the third row because both the keys match with the first record of your source sample data and the date is in between the start and end date. Isnt that what you want???

How it works is that with @INROWNUM in the expression of the hashed file key (dummy key), each and every row will be looked up, but what row goes across to the target link, will be decided by the stage variable. So basically you are putting the conditions in stage variables and constraining the output, hence twisting the logic to work like a range lookup.

Andal · Post by **Andal** » Wed Sep 13, 2006 11:40 pm

Thanks for the Reply, i will check it out by today.