slow changing dimensions Type 2

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

sainath
Premium Member
Premium Member
Posts: 138
Joined: Fri Nov 19, 2004 3:57 pm

slow changing dimensions Type 2

Post by sainath »

Hi

I implemented slow changing dimensions type 2 in server jobs .where i updated and inserted the same records by designing two seperate jobs.


Update job where i have to update the exp date
insert the record


Requirement. IF SAME CODE AND DIFFERENT NAME
Code Name effdate expdate
AAA CANADA 2000-01-03 9999-12-31 (2002-12-31)
AAA AMERICA 2003-01-01 9999-12-31


I JUST WANT TO KNOW WHETHER WE CAN DO THIS IN SAME ONE JOB.

PLEASE REPLY .
SAI

[/b]
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: slow changing dimensions Type 2

Post by chulett »

sainath wrote:I JUST WANT TO KNOW WHETHER WE CAN DO THIS IN SAME ONE JOB.
Sure.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sainath
Premium Member
Premium Member
Posts: 138
Joined: Fri Nov 19, 2004 3:57 pm

Re: slow changing dimensions Type 2

Post by sainath »

sainath wrote:Hi

I implemented slow changing dimensions type 2 in server jobs .where i updated and inserted the same records by designing two seperate jobs.


Update job where i have to update the exp date
insert the record


Requirement. IF SAME CODE AND DIFFERENT NAME
Code Name effdate expdate
AAA CANADA 2000-01-03 9999-12-31 (2002-12-31)
AAA AMERICA 2003-01-01 9999-12-31


I JUST WANT TO KNOW HOW WE CAN DO THIS IN SAME ONE JOB.

PLEASE EXPLAIN IN DETAIL.
thks.
SAI

[/b]
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

If you take a closer look at your database stage you will see it supports many combinations of insert/update or update/insert or replace etc. You should be able to find a combination that handles both your inserts and updates. Be aware that your current design may be a lot more efficient. For example an update/insert job will take every row and try an update statement, if no matching target row exists then it inserts the row instead. This handles both your updates and inserts but it creates a lot of unnecessary update statements. The same problem with insert/update jobs. It is more efficient to split inserts and updates into seperate jobs or into different outputs of the one job.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not really what they're after, Vince. It's a matter of both insert and update, not one or the other. Or at least the possibility of both. :wink:

Sai, look into combining your two jobs - literally. Two links from one transformer into one database stage will do the trick. You've hashed up your current keys so you know which records are new and which aren't, right? And whatever information you need to close the old record is there as well, yes?

Then it's simply a matter of constraints. No hit on the hashed file means a new record and you just do the insert. A hit means you still need to do the insert but you also need to update the 'old' record so that means a row down the update link as well.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A Type 2 SCD should never need to UPDATE - you are always going to generate a new surrogate key value, and therefore you are always going to INSERT.

In loading dimension tables you need two tests.
  • The first is existence - does the key value already exist in the target table? For this to work you need to have set up an inverse map from the "business" keys found in the source data to the surrogate keys found in the dimension table. This map is typically loaded into a hashed file (server jobs) or Lookup File Set (parallel jobs).

    The second test is to determine whether the row has changed in any of the critical columns. For example, you are interested in preserving the history of name changes but possibly not in the history of telephone number changes (after all, you can't phone someone on their old number). If none of the critical columns has changed, then you discard the input row, on the basis that you already have its information.
Once all the dimension tables have been loaded, during which you should have preserved the "business" key to surrogate key mappings, then loading the fact tables is straightforward; you need to map from the "business" keys found in source data to the surrogate keys used in the star schema.

Again, you should only ever be appending new rows to fact tables, since their keys (logically at least) are combinations of the dimension tables' key values.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:A Type 2 SCD should never need to UPDATE - you are always going to generate a new surrogate key value, and therefore you are always going to INSERT.
Hmm... I'm going to have to beg to differ here. Typically, the Type 2 will include an effective date range and that is what is updated on the 'old' record and that is what I was assuming the OP is doing - closing out the old effective date range.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

OK, if you use those you will need to update. I prefer not to. Usually the "when" doesn't matter, in my experience.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
thumsup9
Charter Member
Charter Member
Posts: 168
Joined: Fri Feb 18, 2005 11:29 am

Post by thumsup9 »

Chulet,

In a situation where we have 2 same records with the same natural keys say consumer name.

Rec1 comes , you update the existing target record with EffEndDate -- StartDate-1 of new record
Rec2 comes, update the target record with EffDate--StartDate-1

Both the records go to insert but then I dont need both of them as active records in the target.

I always need only one record as active(which means Eff End Date 2999-12-31) in our case.

How do you handle a situation like this ?


Thanks,
Thums
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

You need two (2) output links with one being an insert of the new record properly defined as an insert (correct values etc...) and you need an update link that properly expires the old record (correct values etc..., but usually only the key, date(s) and active indicator are needed on this link since you are only updating the record).

You will need to know the key you are updating and the insert gets a new surrogate key.

This is very straight forward stuff and quite simple to accomplish in DataStage in one job. I would be happy to send you a simple dsx with a sample SCD job if you contact me offline.

Not to mention that I'm sure Ken Bland has posted the doc regarding SCD type 1,2 and 3 processing somewhere and this is a good document and outlines exactly what needs to be done.

Regards,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Kim hosted it on his website for us. There is a post in the Site forum noting this, somewhere where it wouldn't immediately 'roll off'. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you're prepared to wait for the next release :roll:, Hawk includes an SCD stage that does most of the work for you. 8)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
thumsup9
Charter Member
Charter Member
Posts: 168
Joined: Fri Feb 18, 2005 11:29 am

Post by thumsup9 »

Thanks for your response. Say we have 2 records with the same key from source

Customer ID
1000
1000

Now if we do a hash and we dont find them in Target. Both these records go as Inserts, but I need to make one of them Active and the other as Inactive in the target,

Pls let me know if you need more inputs..

Thanks,
Thums
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Well then, before sending it to the target for insert, you need to flag one of them as inactive and filter it out.
Like in one transformer flag it and in the second transformer constraint it.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
thumsup9
Charter Member
Charter Member
Posts: 168
Joined: Fri Feb 18, 2005 11:29 am

Post by thumsup9 »

Yes Guru, Looks like we are close. I need to identify among all the records which two records have same key, how do I do that here...

Thums
Post Reply