Page 1 of 2

slow changing dimensions Type 2

Posted: Sat Aug 20, 2005 10:52 am
by sainath
Hi

I implemented slow changing dimensions type 2 in server jobs .where i updated and inserted the same records by designing two seperate jobs.


Update job where i have to update the exp date
insert the record


Requirement. IF SAME CODE AND DIFFERENT NAME
Code Name effdate expdate
AAA CANADA 2000-01-03 9999-12-31 (2002-12-31)
AAA AMERICA 2003-01-01 9999-12-31


I JUST WANT TO KNOW WHETHER WE CAN DO THIS IN SAME ONE JOB.

PLEASE REPLY .
SAI

[/b]

Re: slow changing dimensions Type 2

Posted: Sat Aug 20, 2005 6:53 pm
by chulett
sainath wrote:I JUST WANT TO KNOW WHETHER WE CAN DO THIS IN SAME ONE JOB.
Sure.

Re: slow changing dimensions Type 2

Posted: Sun Aug 21, 2005 8:40 pm
by sainath
sainath wrote:Hi

I implemented slow changing dimensions type 2 in server jobs .where i updated and inserted the same records by designing two seperate jobs.


Update job where i have to update the exp date
insert the record


Requirement. IF SAME CODE AND DIFFERENT NAME
Code Name effdate expdate
AAA CANADA 2000-01-03 9999-12-31 (2002-12-31)
AAA AMERICA 2003-01-01 9999-12-31


I JUST WANT TO KNOW HOW WE CAN DO THIS IN SAME ONE JOB.

PLEASE EXPLAIN IN DETAIL.
thks.
SAI

[/b]

Posted: Sun Aug 21, 2005 9:08 pm
by vmcburney
If you take a closer look at your database stage you will see it supports many combinations of insert/update or update/insert or replace etc. You should be able to find a combination that handles both your inserts and updates. Be aware that your current design may be a lot more efficient. For example an update/insert job will take every row and try an update statement, if no matching target row exists then it inserts the row instead. This handles both your updates and inserts but it creates a lot of unnecessary update statements. The same problem with insert/update jobs. It is more efficient to split inserts and updates into seperate jobs or into different outputs of the one job.

Posted: Sun Aug 21, 2005 11:04 pm
by chulett
Not really what they're after, Vince. It's a matter of both insert and update, not one or the other. Or at least the possibility of both. :wink:

Sai, look into combining your two jobs - literally. Two links from one transformer into one database stage will do the trick. You've hashed up your current keys so you know which records are new and which aren't, right? And whatever information you need to close the old record is there as well, yes?

Then it's simply a matter of constraints. No hit on the hashed file means a new record and you just do the insert. A hit means you still need to do the insert but you also need to update the 'old' record so that means a row down the update link as well.

Posted: Mon Aug 22, 2005 5:22 pm
by ray.wurlod
A Type 2 SCD should never need to UPDATE - you are always going to generate a new surrogate key value, and therefore you are always going to INSERT.

In loading dimension tables you need two tests.
  • The first is existence - does the key value already exist in the target table? For this to work you need to have set up an inverse map from the "business" keys found in the source data to the surrogate keys found in the dimension table. This map is typically loaded into a hashed file (server jobs) or Lookup File Set (parallel jobs).

    The second test is to determine whether the row has changed in any of the critical columns. For example, you are interested in preserving the history of name changes but possibly not in the history of telephone number changes (after all, you can't phone someone on their old number). If none of the critical columns has changed, then you discard the input row, on the basis that you already have its information.
Once all the dimension tables have been loaded, during which you should have preserved the "business" key to surrogate key mappings, then loading the fact tables is straightforward; you need to map from the "business" keys found in source data to the surrogate keys used in the star schema.

Again, you should only ever be appending new rows to fact tables, since their keys (logically at least) are combinations of the dimension tables' key values.

Posted: Mon Aug 22, 2005 6:06 pm
by chulett
ray.wurlod wrote:A Type 2 SCD should never need to UPDATE - you are always going to generate a new surrogate key value, and therefore you are always going to INSERT.
Hmm... I'm going to have to beg to differ here. Typically, the Type 2 will include an effective date range and that is what is updated on the 'old' record and that is what I was assuming the OP is doing - closing out the old effective date range.

Posted: Mon Aug 22, 2005 6:14 pm
by ray.wurlod
OK, if you use those you will need to update. I prefer not to. Usually the "when" doesn't matter, in my experience.

Posted: Wed Jul 12, 2006 3:24 pm
by thumsup9
Chulet,

In a situation where we have 2 same records with the same natural keys say consumer name.

Rec1 comes , you update the existing target record with EffEndDate -- StartDate-1 of new record
Rec2 comes, update the target record with EffDate--StartDate-1

Both the records go to insert but then I dont need both of them as active records in the target.

I always need only one record as active(which means Eff End Date 2999-12-31) in our case.

How do you handle a situation like this ?


Thanks,
Thums

Posted: Wed Jul 12, 2006 3:37 pm
by mhester
You need two (2) output links with one being an insert of the new record properly defined as an insert (correct values etc...) and you need an update link that properly expires the old record (correct values etc..., but usually only the key, date(s) and active indicator are needed on this link since you are only updating the record).

You will need to know the key you are updating and the insert gets a new surrogate key.

This is very straight forward stuff and quite simple to accomplish in DataStage in one job. I would be happy to send you a simple dsx with a sample SCD job if you contact me offline.

Not to mention that I'm sure Ken Bland has posted the doc regarding SCD type 1,2 and 3 processing somewhere and this is a good document and outlines exactly what needs to be done.

Regards,

Posted: Wed Jul 12, 2006 4:37 pm
by chulett
Kim hosted it on his website for us. There is a post in the Site forum noting this, somewhere where it wouldn't immediately 'roll off'. :wink:

Posted: Wed Jul 12, 2006 8:23 pm
by ray.wurlod
If you're prepared to wait for the next release :roll:, Hawk includes an SCD stage that does most of the work for you. 8)

Posted: Thu Jul 13, 2006 7:23 am
by thumsup9
Thanks for your response. Say we have 2 records with the same key from source

Customer ID
1000
1000

Now if we do a hash and we dont find them in Target. Both these records go as Inserts, but I need to make one of them Active and the other as Inactive in the target,

Pls let me know if you need more inputs..

Thanks,
Thums

Posted: Thu Jul 13, 2006 7:27 am
by DSguru2B
Well then, before sending it to the target for insert, you need to flag one of them as inactive and filter it out.
Like in one transformer flag it and in the second transformer constraint it.

Posted: Thu Jul 13, 2006 7:35 am
by thumsup9
Yes Guru, Looks like we are close. I need to identify among all the records which two records have same key, how do I do that here...

Thums