Will Slowly Changing Dimension stage handle Delete ?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Will Slowly Changing Dimension stage handle Delete ?

Post by SURA »

Hi there

Just i wish to know whether the Slowly Changing Dimension stage will handle delete?

If the source data deleted physically, I wish the Slowly Changing Dimension stage needs to close the record.

Did any one did this using Slowly Changing Dimension stage ?

DS User
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's documented in the manual as being supported. For Type 2 the target record is marked as expired and no new version is created.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

Then, there might be something wrong in my design. I deleted a record from the landing table, but the job fails to close that record.

Let me check it again.

Thanks Ray

DS User
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

My SCD Stage is not handling delete. If i delete a records from the source, I am expecting the reference data need to mark it as closed. But for some reason it is not happening!!

My first question is, if i delete a record from the source whether it should mark the record as "Close" in reference?

If it should close then I am not sure what mistake i am doing!

Let me explain what i did. If i missed something, please let me know.

In my source and reference i have 2 columns (id, name). Apart from these columns i have open date, close date, current ind flag columns and SKey too.

In SCD stage, Fast path 2

Code: Select all

ID as the Business Key 
Name as Type 2. 
Open date (Effective Date (Type 2)) 
close date (Expiration Date (Type 2)) 
current ind (Current Indicator (type 2))
In SCD stage, Fast path 4

Code: Select all

Open date - CurrentTimeStamp()
Close date - '9999-12-31 00:00:00' in Derivation and CurrentTimeStamp() in Expire.
current ind  - 1 in Derivation and 0 in Expire
Job is properly keep the records as open / close when the records are new / changed. But not responding if the record deleted from the source.

Let me know, if i missed to notice anything!

DS User
eimhelper01
Premium Member
Premium Member
Posts: 10
Joined: Mon May 23, 2011 5:19 am

Post by eimhelper01 »

SURA wrote:My SCD Stage is not handling delete. If i delete a records from the source, I am expecting the reference data need to mark it as closed. But for some reason it is not happening!!

My first question is, if i delete a record from the source whether it should mark the record as "Close" in reference?

If it should close then I am not sure what mistake i am doing!

DS User
Did you get anywhere with this?

I have the exact same issue, If I change a record in source, then the existing record is expired, and the new one is inserted. Also new source records are inserted.

However, if I delete a record from source, then there is no change to the dimension and the record is also not outputted to the fact table.

I've read in the manual about expired records, but table 67, "Dimension update action" doesn't mention a source record missing, only reference records missing.

I'm fine if this requires more than a single SCD stage, but if it does, I'd like to know the best approach.

Thanks
venkatvelpula
Participant
Posts: 80
Joined: Sat Mar 18, 2006 10:27 pm

Post by venkatvelpula »

SCD Stage wouldn't support if source physically deletes the record on their side.

SCD works with Source file as the driver for the lookup process inside SCD to figure out the dimension changes.

The best way to identify those records are..anything which is on history and not on the source file would be your logical delete. this can be achieved before SCD using the lookup stage.

Based on your experience with SCD, what would be the second output of SCD really contains and How we can use that output?

Can I have just one output ( changes only) from SCD?
evee1
Premium Member
Premium Member
Posts: 96
Joined: Tue Oct 06, 2009 4:17 pm
Location: Melbourne, AU

Post by evee1 »

I also have Type 1 Dim with option of closing records that are not in the current load, but are present in the dimension table. I played a bit with SCD trying to set purpose codes on expiry date and current indicator fields, but Datastage requires having some Type 2 fields in such a case. This would turn my dimension into Type 2 which I don't want.

I have abandoned using SCD for that very reason (not handling deletes). Maybe it handles deletes, but I can't make it od it ;-).

My next tought was to use Change Capture and Transformer stages instead (as in my previous projects), but in my case it's no good either as in case of updates I need both, the previous and the current version of the record.

So now I have Join (full outer) and Transformer.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

SCD is good for SCD type2.

I created 3 different jobs to check the performance.

1. SCD
2. Change Capture
3. Join

I can't find much difference between those jobs. So dosen't matter which approach going to use.

If you have more than 1 changes in the same load and it is type 2, then SCD is good.

Otherwise no difference.

In SCD stage to get Type2 , if you need to update the Modified batch id , I can't find any option in the SCD stage. So i used the TFM after SCD out and there i checked the Current Indicator and based on that status i loaded the Modified Batch ID column.

Another funny thing is, by default it is selecting all the reference colums as a key column.

DS User
evee1
Premium Member
Premium Member
Posts: 96
Joined: Tue Oct 06, 2009 4:17 pm
Location: Melbourne, AU

Post by evee1 »

I created 3 different jobs to check the performance.
May I know the data volumes you did this testing with?
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

@ 25,000

DS User
venkatvelpula
Participant
Posts: 80
Joined: Sat Mar 18, 2006 10:27 pm

Post by venkatvelpula »

Did you use two outputs of SCD or just one? If not, what are the observations on the two inputs? what that really contains and how we can use?

Can we get updates separately and inserts separately using parallel SCD stage?
Post Reply