datastage database usage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
palmeal
Participant
Posts: 122
Joined: Thu Oct 14, 2004 7:56 am
Location: Edinburgh, Scotland

datastage database usage

Post by palmeal »

My company are in the process of developing a d/w.
The sources for the data are a series of Sybase database, two sql servers and one access database.
Datastage will be used to transform the data and MS SQL 2K will host the d/w. These things are set in stone.

The question I have is regarding the transform database. I am aware that the Datastage jobs/transofrms will go to the source databases for their information but where should I store the data that comes back from these jobs ? Not all data will be available at the same time so some transforms will have to wait until all data has been collected.
Do I create a database schema within datastage and should I bring the data back to a separate sql server before transforming it ?
If both methods are possible then what are the pros/cons.


This is the one grey area that we have and as we are going through a process of "on the job training" any help would be appreciated here.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

DataStage jobs are like any other normal jobs. You work on some data and produce result data. If the result is to be useful in any way, it must be recorded - either in a separate area or over the existing data.

You can extract information from various objects and components as and when you wish. These extracted sets can then be 'collected' as you desire. But all the work is to be detailed in your jobs and control structure laid around it.

Also the meta-data can be used and shared across job to improve development and performance.

If you can ask more specific questions with possible examples, we will be able to explain in more detail.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can stage your data wherever and however you please. Fastest is text files. You can create DataStage jobs to extract from source and stage the data in text files on the DataStage server machine.

Once all data have arrived, you can then process them through DataStage jobs to perform whatever transformations you require (which will include key management, for example generation and reverse-lookup of any surrogate keys, particularly for Type 2 slowly changing dimensions).

Output from this phase can be directly to the DW database, or can be to another staging area (text files again) from which a bulk loader might be used to load the DW database. Having the second staging area also means that restartability is easier to design.

The whole is controlled by DataStage using either job sequences or hand-crafted job control code, or some amalgam thereof.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply