Can someone explain me the datastage architecture clearly?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
dstage2006
Premium Member
Premium Member
Posts: 116
Joined: Fri Jan 20, 2006 2:30 pm

Can someone explain me the datastage architecture clearly?

Post by dstage2006 »

Can someone please explain me datastage architeture clearly? And also, I wanted to know how one can implement type2 Slowly changing dimensions using Change data capture stage and also using transformer & lookup?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

We build "jobs" which are nothing more than graphical designs of programs. It's important, just like with any language or tool, to develop standards and practices that are repeatable, scalable, meaningful, and add value to the design in a descriptive way. Using judgment and experience for developing robust applications, you will coordinate a series of jobs in an activity to reliably move data around. That coordination will be done using schedulers, scripted job control, a graphical job control, or any combination of the three.

As for implementing an SCD type 2, well, the CDC stage is self explanatory. A transformer and a lookup are methods for changing data and looking at other rows. An SCD transformation flow is basically:

1. Get a new row from a source
2. Transform it from source form to target
3. Look to see if any variants exist in the target database using the natural key
4. If no variants exist, assign a surrogate key, a starting effective date, set the current row indicator, set version number to 1 and insert the new row into the target table.
5. If variants exist, get the one with the greatest starting effective date.
6. Compare the new row to the most current row fetched
7 type 1. If anything important is different, update the existing row
7 type 2. If anything important is different, update the current row with an ending effective date and unset the current row indicator, then assign a surrogate key, a starting effective date, set the current row indicator, set version number to the retired current row version number + 1 and insert the new row into the target table.
7 type 2 hybrid. If anything not so important is different do a type 1, if something important is different do a type 2.
7 type 3. If a certain column is different, shuffle the history columns dropping the oldest value and update the existing row.

There you go!
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Step 3 is typically most efficiently accomplished with a hashed file that is pre-loaded with the relevant rows and columns from the target table, and kept up to date as source rows are processed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ashwin141
Participant
Posts: 95
Joined: Wed Aug 24, 2005 2:26 am
Location: London, UK

Post by ashwin141 »

About Datastage Architecture- It follows Client and Server Architecure

Client Components - designer, director, administrator and manager
Server Components - DS Engine, Repository and Package Installer

To capture data change - you have to use ChangeCapture and ChangeCapureApply Stages

SCD can be implemented as explained by Kenneth and Ray.

Regards
Ashwin
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

At run time the architecture is different. The DataStage clients do not need to be involved. As far as database servers are concerned, the DataStage job is just another client application.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dstage2006
Premium Member
Premium Member
Posts: 116
Joined: Fri Jan 20, 2006 2:30 pm

Datatstage Architecture

Post by dstage2006 »

In an interview they asked me to explain the architecture of Datastage. I tried to explain SMP , MPP architetcure,but they again asked me datastage architetcure. I relaly didnt understant. Can u explain clearly?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are they asking about design-time architecture, compile-time architecture or execution (run) time architecture? Each is different.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
spendem
Participant
Posts: 19
Joined: Tue Mar 14, 2006 11:08 pm
Location: Mumbai/Bangalore

Post by spendem »

Hi All,

Version Control is also a client component,
Hi Ray,
How do we differentiate the Compile time architecture and run time architecture?
Can you please throw some more light on his topic :) .

Many Thanks,
Spendem
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You might find this post illuminating. :idea:

Conversion to HTML has disrupted some of the graphics slightly - this is not my fault!

This presentation is taken from one of my training classes, so the answer to your next question is "only in exchange for money".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
olgc
Participant
Posts: 145
Joined: Tue Nov 18, 2003 9:00 am

Re: Datatstage Architecture

Post by olgc »

[quote="dstage2006"]In an interview they asked me to explain the architecture of Datastage. I tried to explain SMP , MPP architetcure,but they again asked me datastage architetcure. I relaly didnt understant. Can u explain clearly?[/quote]

I think they want the answers as the following:

1. Data warehouse Development Life Cycle
2. DataStage Jobs Folder Structure inside Datastage project
3. Data Folder Structure
4. Parameters setup or management
5. Naming Convention
6. ETL Design Concepts & Techniques (includeing such as Three Tiers ETL Design Concept (i.e. Extraction, Transformation & Loading), Change Data Capture (CDC) Techniques, Error & Exception Handling, Audit Control - Auto-Balancing, and DataStage Version Control and promotion, Implementation)

Hope this help...,
Post Reply