keylookup with date between effective_dt and expiration_dt

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
mczern
Premium Member
Premium Member
Posts: 80
Joined: Thu Jan 20, 2005 8:38 am
Location: Atlanta, GA
Contact:

keylookup with date between effective_dt and expiration_dt

Post by mczern »

Being new to Enterprise Edition I can't figure how to perform a lookup against a type 2 slowly changing dimension table. I need to perform a lookup to return a surrogate key against a table whose natural key value is id, effective_dt, expiration_dt. Has anyone implemented a solution for doing a keylookup against this type of dimension? The lookup should effectively return the surrogate key where id_value = lkp.id and trans_date between lkp.effective_dt and lkp.expiration_dt (or trans_date >= lkp.effective_dt and trans_date < lkp.expiration_dt). Seems the lookup transform only allows inner joins to the lookup table/file.

TIA
Mike Czerniawski
tardifma
Premium Member
Premium Member
Posts: 23
Joined: Tue Jan 24, 2006 10:53 am

Post by tardifma »

Hi mczern.

Type 2 slowly changing dimension is really common in the data warehousing world!!!

If the data you're loading correspond to current data, it's easy... all you need to do is extract only the current image of your dimension and lookup against the key... in other words, in the SQL statement that extract the data of your type 2 dimension, you use #current_date# between start_date and end_date.... and in your lookup, you match only the key.

But if you are loading historical data... and some records of your sources are related to old images of your dimension... I don't think there is an easy way to retreive the good record...

Hope this will help.
The Brute
mczern
Premium Member
Premium Member
Posts: 80
Joined: Thu Jan 20, 2005 8:38 am
Location: Atlanta, GA
Contact:

Post by mczern »

tardifma wrote:Hi mczern.

Type 2 slowly changing dimension is really common in the data warehousing world!!!

If the data you're loading correspond to current data, it's easy... all you need to do is extract only the current image of your dimension and lookup against the key... in other words, in the SQL statement that extract the data of your type 2 dimension, you use #current_date# between start_date and end_date.... and in your lookup, you match only the key.

But if you are loading historical data... and some records of your sources are related to old images of your dimension... I don't think there is an easy way to retreive the good record...

Hope this will help.
Unfortunately for me, I'm dealing with the historical version. Transaction dates vary for each incremental load making selection of just the current record id a non-option.
Mike Czerniawski
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

I've already seen this problem.

My solution was ( i'm not sure it was the most efficient, but it worked and i implemented it on schedule) :

a join between 2 flows with the key
- your flow
- the lookup flow ( contains, id, effective_dt and expiration_dt)


then you filter the data and you keep the record whose date is between effective_dt and expiration_dt.

You solve the problem of historical data.

the lookup allows outer joins but not theta joins!
tardifma
Premium Member
Premium Member
Posts: 23
Joined: Tue Jan 24, 2006 10:53 am

Post by tardifma »

Hi Mike.
I just heard that a new stage will come up in the next version of Datastage... it's called Lookup Range... Maybe it could help you with your problem...

Maybe you could contact an Ascential's sales rep...

Math
The Brute
tardifma
Premium Member
Premium Member
Posts: 23
Joined: Tue Jan 24, 2006 10:53 am

Post by tardifma »

Hi gbusson.
I think your solution would work with a non-large dimension...
As soon as the volume will grow... the performance would be brutal...

But good hint.
The Brute
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

well, i did this for a very large dimension! :)

it was not so brutal, besause the join does not consume very much CPU.

The partition and the sort do, so u have to be very careful ! :idea: put your lookup in a well partitionnend and sorted dataset in another job before.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Could you pre-populate a Lookup File Set with just those rows that satisfy the range restriction, then do the lookup solely on the natural key?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,

If you are ready to use Lookup stage, you can perform range lookup from the same. You can make use of the 'condition' option avaialable in the lookup stage and check for your condition trans_date >= lkp.effective_dt and trans_date < lkp.expiration_dt :D

-Kumar
mczern
Premium Member
Premium Member
Posts: 80
Joined: Thu Jan 20, 2005 8:38 am
Location: Atlanta, GA
Contact:

Post by mczern »

kumar_s wrote:Hi,

If you are ready to use Lookup stage, you can perform range lookup from the same. You can make use of the 'condition' option avaialable in the lookup stage and check for your condition trans_date >= lkp.effective_dt and trans_date < lkp.expiration_dt :D

-Kumar
:?
I'm not sure I understand your reply... If I select 'Input Column' inside the Condition I can see the trans_date column coming from the input flow. However, I don't see the lkp.effective_dt or lkp.expiration_dt that is coming from the lookup table input into the transform. If I try to type it into the expression, I get an error stating that lkp.effective_dt is not found.
Mike Czerniawski
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

That's because your input links are segregated from each other. I cannot use the data from one input link as a condition in another input link. The output link on the lookup does not allow a condition. All you can really do using this method is to allow the lookup to create a cartesian product (because it will pickup more than one validity period) and then use a transformer or a filter stage to narrow your records with the condition that kumar suggested.

It would be nice if the output of the lookup allowed for a condition.
Post Reply