Will Slowly Changing Dimension stage handle Delete ?
Moderators: chulett, rschirm, roy
Will Slowly Changing Dimension stage handle Delete ?
Hi there
Just i wish to know whether the Slowly Changing Dimension stage will handle delete?
If the source data deleted physically, I wish the Slowly Changing Dimension stage needs to close the record.
Did any one did this using Slowly Changing Dimension stage ?
DS User
Just i wish to know whether the Slowly Changing Dimension stage will handle delete?
If the source data deleted physically, I wish the Slowly Changing Dimension stage needs to close the record.
Did any one did this using Slowly Changing Dimension stage ?
DS User
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
My SCD Stage is not handling delete. If i delete a records from the source, I am expecting the reference data need to mark it as closed. But for some reason it is not happening!!
My first question is, if i delete a record from the source whether it should mark the record as "Close" in reference?
If it should close then I am not sure what mistake i am doing!
Let me explain what i did. If i missed something, please let me know.
In my source and reference i have 2 columns (id, name). Apart from these columns i have open date, close date, current ind flag columns and SKey too.
In SCD stage, Fast path 2
In SCD stage, Fast path 4
Job is properly keep the records as open / close when the records are new / changed. But not responding if the record deleted from the source.
Let me know, if i missed to notice anything!
DS User
My first question is, if i delete a record from the source whether it should mark the record as "Close" in reference?
If it should close then I am not sure what mistake i am doing!
Let me explain what i did. If i missed something, please let me know.
In my source and reference i have 2 columns (id, name). Apart from these columns i have open date, close date, current ind flag columns and SKey too.
In SCD stage, Fast path 2
Code: Select all
ID as the Business Key
Name as Type 2.
Open date (Effective Date (Type 2))
close date (Expiration Date (Type 2))
current ind (Current Indicator (type 2))
Code: Select all
Open date - CurrentTimeStamp()
Close date - '9999-12-31 00:00:00' in Derivation and CurrentTimeStamp() in Expire.
current ind - 1 in Derivation and 0 in Expire
Let me know, if i missed to notice anything!
DS User
-
- Premium Member
- Posts: 10
- Joined: Mon May 23, 2011 5:19 am
Did you get anywhere with this?SURA wrote:My SCD Stage is not handling delete. If i delete a records from the source, I am expecting the reference data need to mark it as closed. But for some reason it is not happening!!
My first question is, if i delete a record from the source whether it should mark the record as "Close" in reference?
If it should close then I am not sure what mistake i am doing!
DS User
I have the exact same issue, If I change a record in source, then the existing record is expired, and the new one is inserted. Also new source records are inserted.
However, if I delete a record from source, then there is no change to the dimension and the record is also not outputted to the fact table.
I've read in the manual about expired records, but table 67, "Dimension update action" doesn't mention a source record missing, only reference records missing.
I'm fine if this requires more than a single SCD stage, but if it does, I'd like to know the best approach.
Thanks
-
- Participant
- Posts: 80
- Joined: Sat Mar 18, 2006 10:27 pm
SCD Stage wouldn't support if source physically deletes the record on their side.
SCD works with Source file as the driver for the lookup process inside SCD to figure out the dimension changes.
The best way to identify those records are..anything which is on history and not on the source file would be your logical delete. this can be achieved before SCD using the lookup stage.
Based on your experience with SCD, what would be the second output of SCD really contains and How we can use that output?
Can I have just one output ( changes only) from SCD?
SCD works with Source file as the driver for the lookup process inside SCD to figure out the dimension changes.
The best way to identify those records are..anything which is on history and not on the source file would be your logical delete. this can be achieved before SCD using the lookup stage.
Based on your experience with SCD, what would be the second output of SCD really contains and How we can use that output?
Can I have just one output ( changes only) from SCD?
I also have Type 1 Dim with option of closing records that are not in the current load, but are present in the dimension table. I played a bit with SCD trying to set purpose codes on expiry date and current indicator fields, but Datastage requires having some Type 2 fields in such a case. This would turn my dimension into Type 2 which I don't want.
I have abandoned using SCD for that very reason (not handling deletes). Maybe it handles deletes, but I can't make it od it .
My next tought was to use Change Capture and Transformer stages instead (as in my previous projects), but in my case it's no good either as in case of updates I need both, the previous and the current version of the record.
So now I have Join (full outer) and Transformer.
I have abandoned using SCD for that very reason (not handling deletes). Maybe it handles deletes, but I can't make it od it .
My next tought was to use Change Capture and Transformer stages instead (as in my previous projects), but in my case it's no good either as in case of updates I need both, the previous and the current version of the record.
So now I have Join (full outer) and Transformer.
SCD is good for SCD type2.
I created 3 different jobs to check the performance.
1. SCD
2. Change Capture
3. Join
I can't find much difference between those jobs. So dosen't matter which approach going to use.
If you have more than 1 changes in the same load and it is type 2, then SCD is good.
Otherwise no difference.
In SCD stage to get Type2 , if you need to update the Modified batch id , I can't find any option in the SCD stage. So i used the TFM after SCD out and there i checked the Current Indicator and based on that status i loaded the Modified Batch ID column.
Another funny thing is, by default it is selecting all the reference colums as a key column.
DS User
I created 3 different jobs to check the performance.
1. SCD
2. Change Capture
3. Join
I can't find much difference between those jobs. So dosen't matter which approach going to use.
If you have more than 1 changes in the same load and it is type 2, then SCD is good.
Otherwise no difference.
In SCD stage to get Type2 , if you need to update the Modified batch id , I can't find any option in the SCD stage. So i used the TFM after SCD out and there i checked the Current Indicator and based on that status i loaded the Modified Batch ID column.
Another funny thing is, by default it is selecting all the reference colums as a key column.
DS User
-
- Participant
- Posts: 80
- Joined: Sat Mar 18, 2006 10:27 pm