slow changing dimensions Type 2

rwierdsm · Post by **rwierdsm** » Thu Jul 13, 2006 7:37 am

In this case there must be some other column in the records that determines which one will be active. The business rules for the values in this column need to be applied, allowing only one to be inserted as 'active'.

How are you indicating 'active'? Do you have a flag or are you using a date? In this case the flag would be easier, because presumably both records have the same date.

Need to know more about your requirements.

Rob

chulett · Post by **chulett** » Thu Jul 13, 2006 7:45 am

thumsup9 wrote:Now if we do a hash and we dont find them in Target. Both these records go as Inserts

Not necessarily. That's your choice based on the job design. If you need the first to go as an insert and the second as an update - update the hashed file!

It other words, write any hashed 'misses' into the same lookup hashed file so it will be there when the next matching key comes through. This does mean a couple of caveats - you cannot simple cache the lookup in memory any longer. Either disable that option or use 'Enabled - Locked for Updates' so it knows you will be playing that game.

Also note that, depending on your commit level, this may 'complicate' any restart of the job. Meaning, while an abort may roll back all of your database work, it won't roll back your hashed file changes. But there are plenty of ways to handle that however, if it is an issue.

DSguru2B · Post by **DSguru2B** » Thu Jul 13, 2006 7:57 am

thumsup9 wrote:I need to identify among all the records which two records have same key, how do I do that here...

A couple of ways, sort on key, and chech for identical keys by using stage variables and then depending upon the deciding column (maybe a date column) flag the inactive one.
Pass it through aggreagator, group on key and apply the appropriate derivation on the date column.
Go for Craigs solution. One of the most elegant ways.

iwin · Post by **iwin** » Thu Jul 13, 2006 10:15 am

I guess u may have to commit for each insert in the dimension table and then updated the hash file with the current data from dimension table but not the history data.
So we you get the same record, it will look for the hash and will find it and then makes an update for existing record and inserts the current record.

chulett · Post by **chulett** » Thu Jul 13, 2006 10:51 am

There's no reason to 'commit for each insert'.

iwin · Post by **iwin** » Thu Jul 13, 2006 11:15 am

chulett wrote:There's no reason to 'commit for each insert'.

Well, i thought we will be able to update the hash file with the current data if we commit for each record in the target table.

chulett · Post by **chulett** » Thu Jul 13, 2006 11:38 am

You... could. It (the database work) would run much slower, however, so keep that in mind. All that really buys you that I can see is the fact that your hashed file would be in sync with your database in the event of an abort, and it changes your restart/recovery process.

I prefer to not do it that way, but you could if you felt it was best.

iwin · Post by **iwin** » Thu Jul 13, 2006 11:43 am

chulett wrote:You... could. It (the database work) would run much slower, however, so keep that in mind. All that really buys you that I can see is the fact that your hashed file would be in sync with your database in the event of an abort, and it changes your restart/recovery process.

I prefer to not do it that way, but you could if you felt it was best.

Thanks for the Suggestion Craig.