slow changing dimensions Type 2
Moderators: chulett, rschirm, roy
slow changing dimensions Type 2
Hi
I implemented slow changing dimensions type 2 in server jobs .where i updated and inserted the same records by designing two seperate jobs.
Update job where i have to update the exp date
insert the record
Requirement. IF SAME CODE AND DIFFERENT NAME
Code Name effdate expdate
AAA CANADA 2000-01-03 9999-12-31 (2002-12-31)
AAA AMERICA 2003-01-01 9999-12-31
I JUST WANT TO KNOW WHETHER WE CAN DO THIS IN SAME ONE JOB.
PLEASE REPLY .
SAI
[/b]
I implemented slow changing dimensions type 2 in server jobs .where i updated and inserted the same records by designing two seperate jobs.
Update job where i have to update the exp date
insert the record
Requirement. IF SAME CODE AND DIFFERENT NAME
Code Name effdate expdate
AAA CANADA 2000-01-03 9999-12-31 (2002-12-31)
AAA AMERICA 2003-01-01 9999-12-31
I JUST WANT TO KNOW WHETHER WE CAN DO THIS IN SAME ONE JOB.
PLEASE REPLY .
SAI
[/b]
Re: slow changing dimensions Type 2
Sure.sainath wrote:I JUST WANT TO KNOW WHETHER WE CAN DO THIS IN SAME ONE JOB.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Re: slow changing dimensions Type 2
sainath wrote:Hi
I implemented slow changing dimensions type 2 in server jobs .where i updated and inserted the same records by designing two seperate jobs.
Update job where i have to update the exp date
insert the record
Requirement. IF SAME CODE AND DIFFERENT NAME
Code Name effdate expdate
AAA CANADA 2000-01-03 9999-12-31 (2002-12-31)
AAA AMERICA 2003-01-01 9999-12-31
I JUST WANT TO KNOW HOW WE CAN DO THIS IN SAME ONE JOB.
PLEASE EXPLAIN IN DETAIL.
thks.
SAI
[/b]
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
If you take a closer look at your database stage you will see it supports many combinations of insert/update or update/insert or replace etc. You should be able to find a combination that handles both your inserts and updates. Be aware that your current design may be a lot more efficient. For example an update/insert job will take every row and try an update statement, if no matching target row exists then it inserts the row instead. This handles both your updates and inserts but it creates a lot of unnecessary update statements. The same problem with insert/update jobs. It is more efficient to split inserts and updates into seperate jobs or into different outputs of the one job.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Not really what they're after, Vince. It's a matter of both insert and update, not one or the other. Or at least the possibility of both.
Sai, look into combining your two jobs - literally. Two links from one transformer into one database stage will do the trick. You've hashed up your current keys so you know which records are new and which aren't, right? And whatever information you need to close the old record is there as well, yes?
Then it's simply a matter of constraints. No hit on the hashed file means a new record and you just do the insert. A hit means you still need to do the insert but you also need to update the 'old' record so that means a row down the update link as well.
Sai, look into combining your two jobs - literally. Two links from one transformer into one database stage will do the trick. You've hashed up your current keys so you know which records are new and which aren't, right? And whatever information you need to close the old record is there as well, yes?
Then it's simply a matter of constraints. No hit on the hashed file means a new record and you just do the insert. A hit means you still need to do the insert but you also need to update the 'old' record so that means a row down the update link as well.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
A Type 2 SCD should never need to UPDATE - you are always going to generate a new surrogate key value, and therefore you are always going to INSERT.
In loading dimension tables you need two tests.
Again, you should only ever be appending new rows to fact tables, since their keys (logically at least) are combinations of the dimension tables' key values.
In loading dimension tables you need two tests.
- The first is existence - does the key value already exist in the target table? For this to work you need to have set up an inverse map from the "business" keys found in the source data to the surrogate keys found in the dimension table. This map is typically loaded into a hashed file (server jobs) or Lookup File Set (parallel jobs).
The second test is to determine whether the row has changed in any of the critical columns. For example, you are interested in preserving the history of name changes but possibly not in the history of telephone number changes (after all, you can't phone someone on their old number). If none of the critical columns has changed, then you discard the input row, on the basis that you already have its information.
Again, you should only ever be appending new rows to fact tables, since their keys (logically at least) are combinations of the dimension tables' key values.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hmm... I'm going to have to beg to differ here. Typically, the Type 2 will include an effective date range and that is what is updated on the 'old' record and that is what I was assuming the OP is doing - closing out the old effective date range.ray.wurlod wrote:A Type 2 SCD should never need to UPDATE - you are always going to generate a new surrogate key value, and therefore you are always going to INSERT.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Chulet,
In a situation where we have 2 same records with the same natural keys say consumer name.
Rec1 comes , you update the existing target record with EffEndDate -- StartDate-1 of new record
Rec2 comes, update the target record with EffDate--StartDate-1
Both the records go to insert but then I dont need both of them as active records in the target.
I always need only one record as active(which means Eff End Date 2999-12-31) in our case.
How do you handle a situation like this ?
Thanks,
Thums
In a situation where we have 2 same records with the same natural keys say consumer name.
Rec1 comes , you update the existing target record with EffEndDate -- StartDate-1 of new record
Rec2 comes, update the target record with EffDate--StartDate-1
Both the records go to insert but then I dont need both of them as active records in the target.
I always need only one record as active(which means Eff End Date 2999-12-31) in our case.
How do you handle a situation like this ?
Thanks,
Thums
You need two (2) output links with one being an insert of the new record properly defined as an insert (correct values etc...) and you need an update link that properly expires the old record (correct values etc..., but usually only the key, date(s) and active indicator are needed on this link since you are only updating the record).
You will need to know the key you are updating and the insert gets a new surrogate key.
This is very straight forward stuff and quite simple to accomplish in DataStage in one job. I would be happy to send you a simple dsx with a sample SCD job if you contact me offline.
Not to mention that I'm sure Ken Bland has posted the doc regarding SCD type 1,2 and 3 processing somewhere and this is a good document and outlines exactly what needs to be done.
Regards,
You will need to know the key you are updating and the insert gets a new surrogate key.
This is very straight forward stuff and quite simple to accomplish in DataStage in one job. I would be happy to send you a simple dsx with a sample SCD job if you contact me offline.
Not to mention that I'm sure Ken Bland has posted the doc regarding SCD type 1,2 and 3 processing somewhere and this is a good document and outlines exactly what needs to be done.
Regards,
Mike Hester
mhester@petra-ps.com
mhester@petra-ps.com
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Thanks for your response. Say we have 2 records with the same key from source
Customer ID
1000
1000
Now if we do a hash and we dont find them in Target. Both these records go as Inserts, but I need to make one of them Active and the other as Inactive in the target,
Pls let me know if you need more inputs..
Thanks,
Thums
Customer ID
1000
1000
Now if we do a hash and we dont find them in Target. Both these records go as Inserts, but I need to make one of them Active and the other as Inactive in the target,
Pls let me know if you need more inputs..
Thanks,
Thums