Date Range Lookups

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Andal
Participant
Posts: 124
Joined: Thu Dec 02, 2004 6:24 am
Location: Bangalore, India

Date Range Lookups

Post by Andal »

Hi All

I want to perform a date range lookup Something like

Code: Select all

Source.key1 = lkp.key1 and
source.key2 = lkp.key2 and
lkp.EFF_START_DT <= Source.EFF_DT and 
lkp.EFF_END_DT >= Source.EFF_DT 
Now we are doing a lookup directly with the database. But this affects the performance very much. we are getting 10+ rows per second only.

And based upon the suggestions from this forum, i tried using Universe Stage also, but it doesnt improve the performance much.

And in some other posts viewtopic.php?t=102858, DSguru2B has mentioned that we can do this with the Hashed File by intorducing a dummy Key Filed and match it with @INROWNUM

Code: Select all

if  NOT(RefLink.NOTFOUND) AND main_id = lookup_id AND ICONV(dt_record,"D/MDY[2,2,4]") >= ICONV(dt_start,"D/MDY[2,2,4]") AND ICONV(dt_record,"D/MDY[2,2,4]") <= ICONV(dt_end,"D/MDY[2,2,4]") then 'P' else 'N' 
My Question is, how this logic will work.
1) If we mention KEY1 and KEY2 alone as key columns in the Hashed file , then the values can be overwritten.

Code: Select all

For Ex.

Key1$Key2$Desc$Eff_Start_Dt$Eff_End_Dt
A$B$TestDesc$01/01/2005$12/31/2005 
A$B$TestDesc1$01/01/2006$12/31/2006
We will have the Second row only in the Hashed File if we make KEY1 and KEY2 as keycolumns.

2) If we make the Date fields also as KeyColumns and if we pass EFF_DT form the source, the lookup wont match because it will try to do a equivalent match.

3) This will be same case, if we introduce a dummy sequential key field in Hashed file and match it with the @INROWNUM.

Please Suggest me how to proceed with this.

Note :

My Source and target is having around 1 million rows.
Rgds
Anand
loveojha2
Participant
Posts: 362
Joined: Thu May 26, 2005 12:59 am

Post by loveojha2 »

If we mention KEY1 and KEY2 alone as key columns in the Hashed file , then the values can be overwritten
Instead of Hashed File use Universe Stage with Key1 and Key2 as keys not the date column.

Select multi row lookup as checked.

Once rows are looked up (more than one may be), check for the date condition outside the lookup, within the transformer (may be while writing to the link as link constraint.
Success consists of getting up just one more time than you fall.
Andal
Participant
Posts: 124
Joined: Thu Dec 02, 2004 6:24 am
Location: Bangalore, India

Post by Andal »

Hi Love

I have already tried with universe stage and it doesnt improve the performance much.
Rgds
Anand
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

The post that you have referenced, read it carefully. In that you only specify the dummy column (running number) as the key and no other key. This way you will retain each and every row and not lose any. Then you can use my derivation to get the range lookup.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

It is actually populating the records with the intermediate values between the to and from range. But again it will chalange the performace for the 1million row lookup. But is has advantage of equality predicate.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You might also contemplate indexing the search fields in the hashed file (the UV table).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Andal
Participant
Posts: 124
Joined: Thu Dec 02, 2004 6:24 am
Location: Bangalore, India

Post by Andal »

DSguru2B wrote:The post that you have referenced, read it carefully. In that you only specify the dummy column (running number) as the key and no other key. This way you will retain each and every row and not lose any. Then you can use my derivation to get the range lookup.
If we match the @INROWNUM with dummy sequential key, I hope we wont get the desired results. Suppose for my first row from the source KEY1 and KEY2 are "AA" and "BB". And in the Hashed File the dummy keycolumn "1" may contain some other values for original lookup keys, something like "CC" and "DD".

This will fail in

Code: Select all

if  NOT(RefLink.NOTFOUND) AND main_id = lookup_id
Please explain.
Rgds
Anand
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

The dummy key in the hashed file will be a running number, like 1,2,3...and so on. Inside the transformer, the expression of the key column will be @INROWNUM. This way each and every record will be looked up (NOTE: No matching with the source is done at this stage)
All the matching is done in the stage variables. In your case you have two keys so the derivation will be

Code: Select all

if  NOT(RefLink.NOTFOUND) AND SRC.KEY1= REF.KEY1 AND SRC.KEY2=REF.KEY2 and so on .....
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Andal
Participant
Posts: 124
Joined: Thu Dec 02, 2004 6:24 am
Location: Bangalore, India

Post by Andal »

Please forgive me If i am so dumb to understand the logic. Let me explain my doubt with some example

Snapshot of my first 2 rows in Source

Code: Select all

Key1$Key2$EFF_DT
DD$EE$01/01/2005
GG$HH$01/01/2005
Snapshot of my Hashed File

Code: Select all

DummyKey$SRCMATCH1$SRCMATCH2$LKPDESC$EFF_START_DT$EFF_END_DT
1$CC$DD$DESC1$01/01/2005$12/31/2005
2$CC$DD$DESC2$01/01/2006$12/31/2006
3$DD$EE$DESC1$01/01/2005$12/31/2005
4$DD$EE$DESC2$01/01/2006$12/31/2006
I had shown some sample data above. In the Hashed file DummyKey is the only keycolumn. So from the source if the first row comes @INROWNUM will be 1, and if we do a match with dummykey alone it will find a match and it will return the first row.

So if we check something like

Code: Select all

if  NOT(RefLink.NOTFOUND) AND SRC.KEY1= REF.KEY1 AND SRC.KEY2=REF.KEY2
in the matching condition SRC.KEY1 = REF.KEY1 it will fail, but i am having a matching condition in the third row.

Please explain
Rgds
Anand
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

You are getting a matching record in the third row because both the keys match with the first record of your source sample data and the date is in between the start and end date. Isnt that what you want???

How it works is that with @INROWNUM in the expression of the hashed file key (dummy key), each and every row will be looked up, but what row goes across to the target link, will be decided by the stage variable. So basically you are putting the conditions in stage variables and constraining the output, hence twisting the logic to work like a range lookup.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Andal
Participant
Posts: 124
Joined: Thu Dec 02, 2004 6:24 am
Location: Bangalore, India

Post by Andal »

Thanks for the Reply, i will check it out by today.
Rgds
Anand
Post Reply