Page 1 of 1

What is SCD.How we in Data Stage

Posted: Tue May 13, 2008 1:24 pm
by Rajesh.venkat
:?:

Re: What is SCD.How we in Data Stage

Posted: Tue May 13, 2008 3:23 pm
by vivekgadwal
Rajesh.venkat wrote: :?:
SCD means Slowly Changing Dimensions. There is a lot of learning material available online on this topic.
To be terse, Slowly Changing Dimension problem is a very common problem in Data Warehouses when an attribute for a row changes over time albeit over a long period. That means, you have loaded some data in a dimension table. Now, after some time, you get a changed attribute for one of the rows. What would you do in that scenario?
There are multiple ways to solve this. However, here are the three main types:

Type1: The new record replaces the original record. No trace of the old record exists.

Type2: A new record is added into the customer dimension table with an Active flag or something. Therefore, the customer is treated essentially as two people. This is the most popular method.

Type3: The original record is modified to reflect the change. Essentially, this method requires adding a new column to reflect the change. This is very seldom used.

Hope this helps. But, to learn more, just google it!!!!

Posted: Tue May 13, 2008 4:22 pm
by ray.wurlod
Type 4 : Like Type 2, but the history is maintained in a separate table.

Posted: Tue May 13, 2008 4:26 pm
by vivekgadwal
ray.wurlod wrote:Type 4 : Like Type 2, but the history is maintained in a separate table.
Ray,

I didn't know that there is a Type 4 SCD. Is this adopted often in companies? So far, I have been doing Type 2 with an Active Flag to denote which is the most recent record!

Thanks,

Posted: Tue May 13, 2008 5:01 pm
by ray.wurlod
It's a bad day when you don't learn anything!

As to how widely it is used, I have no idea. I have seen it on a couple of sites where data volumes are large, and the vast majority of queries were not interested in non-current data - the idea of Type 4 is to speed those queries.

Posted: Tue May 13, 2008 5:08 pm
by vivekgadwal
ray.wurlod wrote:It's a bad day when you don't learn anything!
I know and I haven't come across this Type 4 anywhere...so may be i have to keep an eye on it! :D

Thanks for the reply...

Posted: Wed May 14, 2008 2:48 am
by AmeyJoshi14
Hi,

:!: The link below will give you an idea to implement SCD in datastage. :wink:
http://etl-tools.info/en/datastage/data ... torial.htm

Posted: Wed May 14, 2008 3:52 am
by DSRajesh
This one is in server jobs.But how can we implement in parallel jobs.

Posted: Wed May 14, 2008 3:58 am
by ray.wurlod
If you have version 8, you use the SCD stage. If you have an earlier version you use any stage that can auto-commit to persistent storage.

scd

Posted: Wed May 14, 2008 4:46 am
by DSRajesh
Its simple just use change capture with switch stage

Posted: Wed May 14, 2008 6:13 am
by DSRajesh
No Response so far!!!!!!!!!

Posted: Wed May 14, 2008 6:54 am
by chulett
Good Lord, calm down. How long did you wait before whining, a hour and a half? Sheesh!!!!!!!!

Posted: Wed May 14, 2008 9:04 am
by vivekgadwal
DSRajesh wrote: Can i know how do i can implement in datastage 7.5x2 version.

Please help here
Rajesh,

We sure can help, but it would help if you can go through the link that AmeyJoshi has sent. It would also help if you can post the issue you are facing.
There are several ways to tackle the SCD problem as I, and several other people, pointed out. I usually have used SCD Type 2 approach and a simple way to implement it is using a Update-Insert logic. That is, if you lookup based on the key columns and if they are the same coming from the input, then you update a flag in the output (target) to indicate that the row has been changed and a new row is coming in. Hence, this row is history now. After this update has taken place, then you insert the new row coming from the input into the target (all in the same job).

As I said, the scenario changes from place to place. It might be very complicated to implement too. Please do post your requirement and somebody can help you!

Hope this helps...

Posted: Thu May 15, 2008 9:23 am
by AmeyJoshi14
Hi,
As per the suggestion by Vivek there are number of ways to implement SCD in datastage.
Link which i have posted is for Server job but we can get some idea.From the details given in the link:
http://etl-tools.info/en/datastage/data ... torial.htm
we can implement the same in the Parallel Jobs as well :wink:
Following is one of the method for implementation SCD:
SCD1:
As per the example given in above link we are updating the CUST_TYPE_ID based on CUST_ID.
We can create an another job which will load the lookup Dataset with old values in table(Target_dimension Table)
The job structure is :

Code: Select all

	            Dataset/TMP_Table
	              |
	              |
	Source---->lookup----->Target Dimension.
In the target table we are using simple update query.

SCD2:
In our project we have implemented the SCD2 by adding one more column to target table,as per the example given in the link
i am modifying the column name example CHNGE_CUST_TYPE_ID.If there is no change in the column then we have hardcoded it as "NO CHANGE"
We are doing lookup based on the CUST_ID and passing the value of CUST_TYPE(from the source) as it CUST_TYPE_ID_NEW.The details are
shown in the below link:

http://joshi.amey2000.googlepages.com/scd2changes

The job structure is :

Code: Select all

	            Dataset/TMP_Table
	              |
	              |
	Source---->lookup----->Transfomer----->Target Dimension

In the target table we are using simple updating it based on the CUST_ID.
This is not exactly SCD2,but some modification of SCD2 since we have not added any extra column like Active Flag.

SCD3:
There is only one change with resepect to SCD2 ,that is we are tracking the changes in the dimension with the help of 'date' field ,this new column is CHANGE_DATE.
The first link will give the details in the lookup stage.
http://joshi.amey2000.googlepages.com/scd3

The below link shows that we have updated the new column with CHANGE_DATE with the current_date.
http://joshi.amey2000.googlepages.com/scd3_tranfomer2

Correct if something is missing. :oops: