Page 1 of 1

Will Slowly Changing Dimension stage handle Delete ?

Posted: Tue Sep 20, 2011 12:41 am
by SURA
Hi there

Just i wish to know whether the Slowly Changing Dimension stage will handle delete?

If the source data deleted physically, I wish the Slowly Changing Dimension stage needs to close the record.

Did any one did this using Slowly Changing Dimension stage ?

DS User

Posted: Tue Sep 20, 2011 1:42 am
by ray.wurlod
It's documented in the manual as being supported. For Type 2 the target record is marked as expired and no new version is created.

Posted: Tue Sep 20, 2011 6:13 am
by SURA
Then, there might be something wrong in my design. I deleted a record from the landing table, but the job fails to close that record.

Let me check it again.

Thanks Ray

DS User

Posted: Tue Sep 20, 2011 5:46 pm
by SURA
My SCD Stage is not handling delete. If i delete a records from the source, I am expecting the reference data need to mark it as closed. But for some reason it is not happening!!

My first question is, if i delete a record from the source whether it should mark the record as "Close" in reference?

If it should close then I am not sure what mistake i am doing!

Let me explain what i did. If i missed something, please let me know.

In my source and reference i have 2 columns (id, name). Apart from these columns i have open date, close date, current ind flag columns and SKey too.

In SCD stage, Fast path 2

Code: Select all

ID as the Business Key 
Name as Type 2. 
Open date (Effective Date (Type 2)) 
close date (Expiration Date (Type 2)) 
current ind (Current Indicator (type 2))
In SCD stage, Fast path 4

Code: Select all

Open date - CurrentTimeStamp()
Close date - '9999-12-31 00:00:00' in Derivation and CurrentTimeStamp() in Expire.
current ind  - 1 in Derivation and 0 in Expire
Job is properly keep the records as open / close when the records are new / changed. But not responding if the record deleted from the source.

Let me know, if i missed to notice anything!

DS User

Posted: Thu Oct 27, 2011 12:42 am
by eimhelper01
SURA wrote:My SCD Stage is not handling delete. If i delete a records from the source, I am expecting the reference data need to mark it as closed. But for some reason it is not happening!!

My first question is, if i delete a record from the source whether it should mark the record as "Close" in reference?

If it should close then I am not sure what mistake i am doing!

DS User
Did you get anywhere with this?

I have the exact same issue, If I change a record in source, then the existing record is expired, and the new one is inserted. Also new source records are inserted.

However, if I delete a record from source, then there is no change to the dimension and the record is also not outputted to the fact table.

I've read in the manual about expired records, but table 67, "Dimension update action" doesn't mention a source record missing, only reference records missing.

I'm fine if this requires more than a single SCD stage, but if it does, I'd like to know the best approach.

Thanks

Posted: Thu Oct 27, 2011 10:50 am
by venkatvelpula
SCD Stage wouldn't support if source physically deletes the record on their side.

SCD works with Source file as the driver for the lookup process inside SCD to figure out the dimension changes.

The best way to identify those records are..anything which is on history and not on the source file would be your logical delete. this can be achieved before SCD using the lookup stage.

Based on your experience with SCD, what would be the second output of SCD really contains and How we can use that output?

Can I have just one output ( changes only) from SCD?

Posted: Wed Nov 02, 2011 4:26 pm
by evee1
I also have Type 1 Dim with option of closing records that are not in the current load, but are present in the dimension table. I played a bit with SCD trying to set purpose codes on expiry date and current indicator fields, but Datastage requires having some Type 2 fields in such a case. This would turn my dimension into Type 2 which I don't want.

I have abandoned using SCD for that very reason (not handling deletes). Maybe it handles deletes, but I can't make it od it ;-).

My next tought was to use Change Capture and Transformer stages instead (as in my previous projects), but in my case it's no good either as in case of updates I need both, the previous and the current version of the record.

So now I have Join (full outer) and Transformer.

Posted: Wed Nov 02, 2011 5:46 pm
by SURA
SCD is good for SCD type2.

I created 3 different jobs to check the performance.

1. SCD
2. Change Capture
3. Join

I can't find much difference between those jobs. So dosen't matter which approach going to use.

If you have more than 1 changes in the same load and it is type 2, then SCD is good.

Otherwise no difference.

In SCD stage to get Type2 , if you need to update the Modified batch id , I can't find any option in the SCD stage. So i used the TFM after SCD out and there i checked the Current Indicator and based on that status i loaded the Modified Batch ID column.

Another funny thing is, by default it is selecting all the reference colums as a key column.

DS User

Posted: Wed Nov 02, 2011 6:40 pm
by evee1
I created 3 different jobs to check the performance.
May I know the data volumes you did this testing with?

Posted: Wed Nov 02, 2011 6:49 pm
by SURA
@ 25,000

DS User

Posted: Mon Nov 07, 2011 2:54 pm
by venkatvelpula
Did you use two outputs of SCD or just one? If not, what are the observations on the two inputs? what that really contains and how we can use?

Can we get updates separately and inserts separately using parallel SCD stage?