What is SCD.How we in Data Stage

vivekgadwal · Post by **vivekgadwal** » Tue May 13, 2008 3:23 pm

Rajesh.venkat wrote:

SCD means Slowly Changing Dimensions. There is a lot of learning material available online on this topic.
To be terse, Slowly Changing Dimension problem is a very common problem in Data Warehouses when an attribute for a row changes over time albeit over a long period. That means, you have loaded some data in a dimension table. Now, after some time, you get a changed attribute for one of the rows. What would you do in that scenario?
There are multiple ways to solve this. However, here are the three main types:

Type1: The new record replaces the original record. No trace of the old record exists.

Type2: A new record is added into the customer dimension table with an Active flag or something. Therefore, the customer is treated essentially as two people. This is the most popular method.

Type3: The original record is modified to reflect the change. Essentially, this method requires adding a new column to reflect the change. This is very seldom used.

Hope this helps. But, to learn more, just google it!!!!

ray.wurlod · Post by **ray.wurlod** » Tue May 13, 2008 4:22 pm

Type 4 : Like Type 2, but the history is maintained in a separate table.

vivekgadwal · Post by **vivekgadwal** » Tue May 13, 2008 4:26 pm

ray.wurlod wrote:Type 4 : Like Type 2, but the history is maintained in a separate table.

Ray,

I didn't know that there is a Type 4 SCD. Is this adopted often in companies? So far, I have been doing Type 2 with an Active Flag to denote which is the most recent record!

Thanks,

ray.wurlod · Post by **ray.wurlod** » Tue May 13, 2008 5:01 pm

It's a bad day when you don't learn anything!

As to how widely it is used, I have no idea. I have seen it on a couple of sites where data volumes are large, and the vast majority of queries were not interested in non-current data - the idea of Type 4 is to speed those queries.

vivekgadwal · Post by **vivekgadwal** » Tue May 13, 2008 5:08 pm

ray.wurlod wrote:It's a bad day when you don't learn anything!

I know and I haven't come across this Type 4 anywhere...so may be i have to keep an eye on it! :D

Thanks for the reply...

AmeyJoshi14 · Post by **AmeyJoshi14** » Wed May 14, 2008 2:48 am

Hi,

The link below will give you an idea to implement SCD in datastage.

http://etl-tools.info/en/datastage/data ... torial.htm

DSRajesh · Post by **DSRajesh** » Wed May 14, 2008 3:52 am

This one is in server jobs.But how can we implement in parallel jobs.

ray.wurlod · Post by **ray.wurlod** » Wed May 14, 2008 3:58 am

If you have version 8, you use the SCD stage. If you have an earlier version you use any stage that can auto-commit to persistent storage.

DSRajesh · Post by **DSRajesh** » Wed May 14, 2008 4:46 am

Its simple just use change capture with switch stage

DSRajesh · Post by **DSRajesh** » Wed May 14, 2008 6:13 am

No Response so far!!!!!!!!!

chulett · Post by **chulett** » Wed May 14, 2008 6:54 am

Good Lord, calm down. How long did you wait before whining, a hour and a half? Sheesh!!!!!!!!

vivekgadwal · Post by **vivekgadwal** » Wed May 14, 2008 9:04 am

DSRajesh wrote: Can i know how do i can implement in datastage 7.5x2 version.

Please help here

Rajesh,

We sure can help, but it would help if you can go through the link that AmeyJoshi has sent. It would also help if you can post the issue you are facing.
There are several ways to tackle the SCD problem as I, and several other people, pointed out. I usually have used SCD Type 2 approach and a simple way to implement it is using a Update-Insert logic. That is, if you lookup based on the key columns and if they are the same coming from the input, then you update a flag in the output (target) to indicate that the row has been changed and a new row is coming in. Hence, this row is history now. After this update has taken place, then you insert the new row coming from the input into the target (all in the same job).

As I said, the scenario changes from place to place. It might be very complicated to implement too. Please do post your requirement and somebody can help you!

Hope this helps...

AmeyJoshi14 · Post by **AmeyJoshi14** » Thu May 15, 2008 9:23 am

Hi,
As per the suggestion by Vivek there are number of ways to implement SCD in datastage.
Link which i have posted is for Server job but we can get some idea.From the details given in the link:
http://etl-tools.info/en/datastage/data ... torial.htm
we can implement the same in the Parallel Jobs as well

Following is one of the method for implementation SCD:
SCD1:
As per the example given in above link we are updating the CUST_TYPE_ID based on CUST_ID.
We can create an another job which will load the lookup Dataset with old values in table(Target_dimension Table)
The job structure is :

Code: Select all

	            Dataset/TMP_Table
	              |
	              |
	Source---->lookup----->Target Dimension.

In the target table we are using simple update query.

SCD2:
In our project we have implemented the SCD2 by adding one more column to target table,as per the example given in the link
i am modifying the column name example CHNGE_CUST_TYPE_ID.If there is no change in the column then we have hardcoded it as "NO CHANGE"
We are doing lookup based on the CUST_ID and passing the value of CUST_TYPE(from the source) as it CUST_TYPE_ID_NEW.The details are
shown in the below link:

http://joshi.amey2000.googlepages.com/scd2changes

The job structure is :

Code: Select all

	            Dataset/TMP_Table
	              |
	              |
	Source---->lookup----->Transfomer----->Target Dimension

In the target table we are using simple updating it based on the CUST_ID.
This is not exactly SCD2,but some modification of SCD2 since we have not added any extra column like Active Flag.

SCD3:
There is only one change with resepect to SCD2 ,that is we are tracking the changes in the dimension with the help of 'date' field ,this new column is CHANGE_DATE.
The first link will give the details in the lookup stage.
http://joshi.amey2000.googlepages.com/scd3

The below link shows that we have updated the new column with CHANGE_DATE with the current_date.
http://joshi.amey2000.googlepages.com/scd3_tranfomer2

Correct if something is missing.

DSXchange

What is SCD.How we in Data Stage

What is SCD.How we in Data Stage

Re: What is SCD.How we in Data Stage

scd