Can someone explain me the datastage architecture clearly?
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 116
- Joined: Fri Jan 20, 2006 2:30 pm
Can someone explain me the datastage architecture clearly?
Can someone please explain me datastage architeture clearly? And also, I wanted to know how one can implement type2 Slowly changing dimensions using Change data capture stage and also using transformer & lookup?
We build "jobs" which are nothing more than graphical designs of programs. It's important, just like with any language or tool, to develop standards and practices that are repeatable, scalable, meaningful, and add value to the design in a descriptive way. Using judgment and experience for developing robust applications, you will coordinate a series of jobs in an activity to reliably move data around. That coordination will be done using schedulers, scripted job control, a graphical job control, or any combination of the three.
As for implementing an SCD type 2, well, the CDC stage is self explanatory. A transformer and a lookup are methods for changing data and looking at other rows. An SCD transformation flow is basically:
1. Get a new row from a source
2. Transform it from source form to target
3. Look to see if any variants exist in the target database using the natural key
4. If no variants exist, assign a surrogate key, a starting effective date, set the current row indicator, set version number to 1 and insert the new row into the target table.
5. If variants exist, get the one with the greatest starting effective date.
6. Compare the new row to the most current row fetched
7 type 1. If anything important is different, update the existing row
7 type 2. If anything important is different, update the current row with an ending effective date and unset the current row indicator, then assign a surrogate key, a starting effective date, set the current row indicator, set version number to the retired current row version number + 1 and insert the new row into the target table.
7 type 2 hybrid. If anything not so important is different do a type 1, if something important is different do a type 2.
7 type 3. If a certain column is different, shuffle the history columns dropping the oldest value and update the existing row.
There you go!
As for implementing an SCD type 2, well, the CDC stage is self explanatory. A transformer and a lookup are methods for changing data and looking at other rows. An SCD transformation flow is basically:
1. Get a new row from a source
2. Transform it from source form to target
3. Look to see if any variants exist in the target database using the natural key
4. If no variants exist, assign a surrogate key, a starting effective date, set the current row indicator, set version number to 1 and insert the new row into the target table.
5. If variants exist, get the one with the greatest starting effective date.
6. Compare the new row to the most current row fetched
7 type 1. If anything important is different, update the existing row
7 type 2. If anything important is different, update the current row with an ending effective date and unset the current row indicator, then assign a surrogate key, a starting effective date, set the current row indicator, set version number to the retired current row version number + 1 and insert the new row into the target table.
7 type 2 hybrid. If anything not so important is different do a type 1, if something important is different do a type 2.
7 type 3. If a certain column is different, shuffle the history columns dropping the oldest value and update the existing row.
There you go!
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Step 3 is typically most efficiently accomplished with a hashed file that is pre-loaded with the relevant rows and columns from the target table, and kept up to date as source rows are processed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
About Datastage Architecture- It follows Client and Server Architecure
Client Components - designer, director, administrator and manager
Server Components - DS Engine, Repository and Package Installer
To capture data change - you have to use ChangeCapture and ChangeCapureApply Stages
SCD can be implemented as explained by Kenneth and Ray.
Regards
Ashwin
Client Components - designer, director, administrator and manager
Server Components - DS Engine, Repository and Package Installer
To capture data change - you have to use ChangeCapture and ChangeCapureApply Stages
SCD can be implemented as explained by Kenneth and Ray.
Regards
Ashwin
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
At run time the architecture is different. The DataStage clients do not need to be involved. As far as database servers are concerned, the DataStage job is just another client application.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 116
- Joined: Fri Jan 20, 2006 2:30 pm
Datatstage Architecture
In an interview they asked me to explain the architecture of Datastage. I tried to explain SMP , MPP architetcure,but they again asked me datastage architetcure. I relaly didnt understant. Can u explain clearly?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You might find this post illuminating. ![Idea :idea:](./images/smilies/icon_idea.gif)
Conversion to HTML has disrupted some of the graphics slightly - this is not my fault!
This presentation is taken from one of my training classes, so the answer to your next question is "only in exchange for money".
![Idea :idea:](./images/smilies/icon_idea.gif)
Conversion to HTML has disrupted some of the graphics slightly - this is not my fault!
This presentation is taken from one of my training classes, so the answer to your next question is "only in exchange for money".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Re: Datatstage Architecture
[quote="dstage2006"]In an interview they asked me to explain the architecture of Datastage. I tried to explain SMP , MPP architetcure,but they again asked me datastage architetcure. I relaly didnt understant. Can u explain clearly?[/quote]
I think they want the answers as the following:
1. Data warehouse Development Life Cycle
2. DataStage Jobs Folder Structure inside Datastage project
3. Data Folder Structure
4. Parameters setup or management
5. Naming Convention
6. ETL Design Concepts & Techniques (includeing such as Three Tiers ETL Design Concept (i.e. Extraction, Transformation & Loading), Change Data Capture (CDC) Techniques, Error & Exception Handling, Audit Control - Auto-Balancing, and DataStage Version Control and promotion, Implementation)
Hope this help...,
I think they want the answers as the following:
1. Data warehouse Development Life Cycle
2. DataStage Jobs Folder Structure inside Datastage project
3. Data Folder Structure
4. Parameters setup or management
5. Naming Convention
6. ETL Design Concepts & Techniques (includeing such as Three Tiers ETL Design Concept (i.e. Extraction, Transformation & Loading), Change Data Capture (CDC) Techniques, Error & Exception Handling, Audit Control - Auto-Balancing, and DataStage Version Control and promotion, Implementation)
Hope this help...,