Page 1 of 1

Can someone explain me the datastage architecture clearly?

Posted: Thu Jun 01, 2006 8:44 pm
by dstage2006
Can someone please explain me datastage architeture clearly? And also, I wanted to know how one can implement type2 Slowly changing dimensions using Change data capture stage and also using transformer & lookup?

Posted: Thu Jun 01, 2006 9:16 pm
by kcbland
We build "jobs" which are nothing more than graphical designs of programs. It's important, just like with any language or tool, to develop standards and practices that are repeatable, scalable, meaningful, and add value to the design in a descriptive way. Using judgment and experience for developing robust applications, you will coordinate a series of jobs in an activity to reliably move data around. That coordination will be done using schedulers, scripted job control, a graphical job control, or any combination of the three.

As for implementing an SCD type 2, well, the CDC stage is self explanatory. A transformer and a lookup are methods for changing data and looking at other rows. An SCD transformation flow is basically:

1. Get a new row from a source
2. Transform it from source form to target
3. Look to see if any variants exist in the target database using the natural key
4. If no variants exist, assign a surrogate key, a starting effective date, set the current row indicator, set version number to 1 and insert the new row into the target table.
5. If variants exist, get the one with the greatest starting effective date.
6. Compare the new row to the most current row fetched
7 type 1. If anything important is different, update the existing row
7 type 2. If anything important is different, update the current row with an ending effective date and unset the current row indicator, then assign a surrogate key, a starting effective date, set the current row indicator, set version number to the retired current row version number + 1 and insert the new row into the target table.
7 type 2 hybrid. If anything not so important is different do a type 1, if something important is different do a type 2.
7 type 3. If a certain column is different, shuffle the history columns dropping the oldest value and update the existing row.

There you go!

Posted: Fri Jun 02, 2006 12:25 am
by ray.wurlod
Step 3 is typically most efficiently accomplished with a hashed file that is pre-loaded with the relevant rows and columns from the target table, and kept up to date as source rows are processed.

Posted: Fri Jun 02, 2006 3:06 am
by ashwin141
About Datastage Architecture- It follows Client and Server Architecure

Client Components - designer, director, administrator and manager
Server Components - DS Engine, Repository and Package Installer

To capture data change - you have to use ChangeCapture and ChangeCapureApply Stages

SCD can be implemented as explained by Kenneth and Ray.

Regards
Ashwin

Posted: Fri Jun 02, 2006 4:30 pm
by ray.wurlod
At run time the architecture is different. The DataStage clients do not need to be involved. As far as database servers are concerned, the DataStage job is just another client application.

Datatstage Architecture

Posted: Sun Jun 04, 2006 6:33 pm
by dstage2006
In an interview they asked me to explain the architecture of Datastage. I tried to explain SMP , MPP architetcure,but they again asked me datastage architetcure. I relaly didnt understant. Can u explain clearly?

Posted: Sun Jun 04, 2006 6:38 pm
by ray.wurlod
Are they asking about design-time architecture, compile-time architecture or execution (run) time architecture? Each is different.

Posted: Mon Jun 05, 2006 12:08 am
by spendem
Hi All,

Version Control is also a client component,
Hi Ray,
How do we differentiate the Compile time architecture and run time architecture?
Can you please throw some more light on his topic :) .

Many Thanks,
Spendem

Posted: Mon Jun 05, 2006 1:27 am
by ray.wurlod
You might find this post illuminating. :idea:

Conversion to HTML has disrupted some of the graphics slightly - this is not my fault!

This presentation is taken from one of my training classes, so the answer to your next question is "only in exchange for money".

Re: Datatstage Architecture

Posted: Mon Jun 05, 2006 8:42 am
by olgc
[quote="dstage2006"]In an interview they asked me to explain the architecture of Datastage. I tried to explain SMP , MPP architetcure,but they again asked me datastage architetcure. I relaly didnt understant. Can u explain clearly?[/quote]

I think they want the answers as the following:

1. Data warehouse Development Life Cycle
2. DataStage Jobs Folder Structure inside Datastage project
3. Data Folder Structure
4. Parameters setup or management
5. Naming Convention
6. ETL Design Concepts & Techniques (includeing such as Three Tiers ETL Design Concept (i.e. Extraction, Transformation & Loading), Change Data Capture (CDC) Techniques, Error & Exception Handling, Audit Control - Auto-Balancing, and DataStage Version Control and promotion, Implementation)

Hope this help...,