slow changing dimensions Type 2

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

In this case there must be some other column in the records that determines which one will be active. The business rules for the values in this column need to be applied, allowing only one to be inserted as 'active'.

How are you indicating 'active'? Do you have a flag or are you using a date? In this case the flag would be easier, because presumably both records have the same date.

Need to know more about your requirements.

Rob
Rob Wierdsma
Toronto, Canada
bartonbishop.com
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

thumsup9 wrote:Now if we do a hash and we dont find them in Target. Both these records go as Inserts
Not necessarily. That's your choice based on the job design. If you need the first to go as an insert and the second as an update - update the hashed file!

It other words, write any hashed 'misses' into the same lookup hashed file so it will be there when the next matching key comes through. This does mean a couple of caveats - you cannot simple cache the lookup in memory any longer. Either disable that option or use 'Enabled - Locked for Updates' so it knows you will be playing that game.

Also note that, depending on your commit level, this may 'complicate' any restart of the job. Meaning, while an abort may roll back all of your database work, it won't roll back your hashed file changes. But there are plenty of ways to handle that however, if it is an issue. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

thumsup9 wrote:I need to identify among all the records which two records have same key, how do I do that here...
A couple of ways, sort on key, and chech for identical keys by using stage variables and then depending upon the deciding column (maybe a date column) flag the inactive one.
Pass it through aggreagator, group on key and apply the appropriate derivation on the date column.
Go for Craigs solution. One of the most elegant ways. :wink:
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
iwin
Premium Member
Premium Member
Posts: 99
Joined: Mon Apr 11, 2005 9:20 pm

Post by iwin »

I guess u may have to commit for each insert in the dimension table and then updated the hash file with the current data from dimension table but not the history data.
So we you get the same record, it will look for the hash and will find it and then makes an update for existing record and inserts the current record.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's no reason to 'commit for each insert'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
iwin
Premium Member
Premium Member
Posts: 99
Joined: Mon Apr 11, 2005 9:20 pm

Post by iwin »

chulett wrote:There's no reason to 'commit for each insert'.
Well, i thought we will be able to update the hash file with the current data if we commit for each record in the target table.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You... could. It (the database work) would run much slower, however, so keep that in mind. All that really buys you that I can see is the fact that your hashed file would be in sync with your database in the event of an abort, and it changes your restart/recovery process.

I prefer to not do it that way, but you could if you felt it was best.
-craig

"You can never have too many knives" -- Logan Nine Fingers
iwin
Premium Member
Premium Member
Posts: 99
Joined: Mon Apr 11, 2005 9:20 pm

Post by iwin »

chulett wrote:You... could. It (the database work) would run much slower, however, so keep that in mind. All that really buys you that I can see is the fact that your hashed file would be in sync with your database in the event of an abort, and it changes your restart/recovery process.

I prefer to not do it that way, but you could if you felt it was best.
Thanks for the Suggestion Craig.
Post Reply