What is SCD.How we in Data Stage

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
Rajesh.venkat
Participant
Posts: 1
Joined: Fri May 09, 2008 11:38 am
Location: Bangalore
Contact:

What is SCD.How we in Data Stage

Post by Rajesh.venkat »

:?:
RAJESH.VENKAT
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Re: What is SCD.How we in Data Stage

Post by vivekgadwal »

Rajesh.venkat wrote: :?:
SCD means Slowly Changing Dimensions. There is a lot of learning material available online on this topic.
To be terse, Slowly Changing Dimension problem is a very common problem in Data Warehouses when an attribute for a row changes over time albeit over a long period. That means, you have loaded some data in a dimension table. Now, after some time, you get a changed attribute for one of the rows. What would you do in that scenario?
There are multiple ways to solve this. However, here are the three main types:

Type1: The new record replaces the original record. No trace of the old record exists.

Type2: A new record is added into the customer dimension table with an Active flag or something. Therefore, the customer is treated essentially as two people. This is the most popular method.

Type3: The original record is modified to reflect the change. Essentially, this method requires adding a new column to reflect the change. This is very seldom used.

Hope this helps. But, to learn more, just google it!!!!
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Type 4 : Like Type 2, but the history is maintained in a separate table.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

ray.wurlod wrote:Type 4 : Like Type 2, but the history is maintained in a separate table.
Ray,

I didn't know that there is a Type 4 SCD. Is this adopted often in companies? So far, I have been doing Type 2 with an Active Flag to denote which is the most recent record!

Thanks,
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's a bad day when you don't learn anything!

As to how widely it is used, I have no idea. I have seen it on a couple of sites where data volumes are large, and the vast majority of queries were not interested in non-current data - the idea of Type 4 is to speed those queries.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

ray.wurlod wrote:It's a bad day when you don't learn anything!
I know and I haven't come across this Type 4 anywhere...so may be i have to keep an eye on it! :D

Thanks for the reply...
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
AmeyJoshi14
Participant
Posts: 334
Joined: Fri Dec 01, 2006 5:17 am
Location: Texas

Post by AmeyJoshi14 »

Hi,

:!: The link below will give you an idea to implement SCD in datastage. :wink:
http://etl-tools.info/en/datastage/data ... torial.htm
http://findingjobsindatastage.blogspot.com/
Theory is when you know all and nothing works. Practice is when all works and nobody knows why. In this case we have put together theory and practice: nothing works. and nobody knows why! (Albert Einstein)
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

Post by DSRajesh »

This one is in server jobs.But how can we implement in parallel jobs.
RD
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you have version 8, you use the SCD stage. If you have an earlier version you use any stage that can auto-commit to persistent storage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

scd

Post by DSRajesh »

Its simple just use change capture with switch stage
Last edited by DSRajesh on Mon May 19, 2008 7:47 am, edited 1 time in total.
RD
DSRajesh
Premium Member
Premium Member
Posts: 297
Joined: Mon Feb 05, 2007 10:37 pm

Post by DSRajesh »

No Response so far!!!!!!!!!
RD
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Good Lord, calm down. How long did you wait before whining, a hour and a half? Sheesh!!!!!!!!
-craig

"You can never have too many knives" -- Logan Nine Fingers
vivekgadwal
Premium Member
Premium Member
Posts: 457
Joined: Tue Sep 25, 2007 4:05 pm

Post by vivekgadwal »

DSRajesh wrote: Can i know how do i can implement in datastage 7.5x2 version.

Please help here
Rajesh,

We sure can help, but it would help if you can go through the link that AmeyJoshi has sent. It would also help if you can post the issue you are facing.
There are several ways to tackle the SCD problem as I, and several other people, pointed out. I usually have used SCD Type 2 approach and a simple way to implement it is using a Update-Insert logic. That is, if you lookup based on the key columns and if they are the same coming from the input, then you update a flag in the output (target) to indicate that the row has been changed and a new row is coming in. Hence, this row is history now. After this update has taken place, then you insert the new row coming from the input into the target (all in the same job).

As I said, the scenario changes from place to place. It might be very complicated to implement too. Please do post your requirement and somebody can help you!

Hope this helps...
Vivek Gadwal

Experience is what you get when you didn't get what you wanted
AmeyJoshi14
Participant
Posts: 334
Joined: Fri Dec 01, 2006 5:17 am
Location: Texas

Post by AmeyJoshi14 »

Hi,
As per the suggestion by Vivek there are number of ways to implement SCD in datastage.
Link which i have posted is for Server job but we can get some idea.From the details given in the link:
http://etl-tools.info/en/datastage/data ... torial.htm
we can implement the same in the Parallel Jobs as well :wink:
Following is one of the method for implementation SCD:
SCD1:
As per the example given in above link we are updating the CUST_TYPE_ID based on CUST_ID.
We can create an another job which will load the lookup Dataset with old values in table(Target_dimension Table)
The job structure is :

Code: Select all

	            Dataset/TMP_Table
	              |
	              |
	Source---->lookup----->Target Dimension.
In the target table we are using simple update query.

SCD2:
In our project we have implemented the SCD2 by adding one more column to target table,as per the example given in the link
i am modifying the column name example CHNGE_CUST_TYPE_ID.If there is no change in the column then we have hardcoded it as "NO CHANGE"
We are doing lookup based on the CUST_ID and passing the value of CUST_TYPE(from the source) as it CUST_TYPE_ID_NEW.The details are
shown in the below link:

http://joshi.amey2000.googlepages.com/scd2changes

The job structure is :

Code: Select all

	            Dataset/TMP_Table
	              |
	              |
	Source---->lookup----->Transfomer----->Target Dimension

In the target table we are using simple updating it based on the CUST_ID.
This is not exactly SCD2,but some modification of SCD2 since we have not added any extra column like Active Flag.

SCD3:
There is only one change with resepect to SCD2 ,that is we are tracking the changes in the dimension with the help of 'date' field ,this new column is CHANGE_DATE.
The first link will give the details in the lookup stage.
http://joshi.amey2000.googlepages.com/scd3

The below link shows that we have updated the new column with CHANGE_DATE with the current_date.
http://joshi.amey2000.googlepages.com/scd3_tranfomer2

Correct if something is missing. :oops:
http://findingjobsindatastage.blogspot.com/
Theory is when you know all and nothing works. Practice is when all works and nobody knows why. In this case we have put together theory and practice: nothing works. and nobody knows why! (Albert Einstein)
Post Reply