datastage database usage

palmeal · Post by **palmeal** » Fri Jan 21, 2005 3:54 am

My company are in the process of developing a d/w.
The sources for the data are a series of Sybase database, two sql servers and one access database.
Datastage will be used to transform the data and MS SQL 2K will host the d/w. These things are set in stone.

The question I have is regarding the transform database. I am aware that the Datastage jobs/transofrms will go to the source databases for their information but where should I store the data that comes back from these jobs ? Not all data will be available at the same time so some transforms will have to wait until all data has been collected.
Do I create a database schema within datastage and should I bring the data back to a separate sql server before transforming it ?
If both methods are possible then what are the pros/cons.

This is the one grey area that we have and as we are going through a process of "on the job training" any help would be appreciated here.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Fri Jan 21, 2005 4:00 am

DataStage jobs are like any other normal jobs. You work on some data and produce result data. If the result is to be useful in any way, it must be recorded - either in a separate area or over the existing data.

You can extract information from various objects and components as and when you wish. These extracted sets can then be 'collected' as you desire. But all the work is to be detailed in your jobs and control structure laid around it.

Also the meta-data can be used and shared across job to improve development and performance.

If you can ask more specific questions with possible examples, we will be able to explain in more detail.

ray.wurlod · Post by **ray.wurlod** » Fri Jan 21, 2005 4:00 am

You can stage your data wherever and however you please. Fastest is text files. You can create DataStage jobs to extract from source and stage the data in text files on the DataStage server machine.

Once all data have arrived, you can then process them through DataStage jobs to perform whatever transformations you require (which will include key management, for example generation and reverse-lookup of any surrogate keys, particularly for Type 2 slowly changing dimensions).

Output from this phase can be directly to the DW database, or can be to another staging area (text files again) from which a bulk loader might be used to load the DW database. Having the second staging area also means that restartability is easier to design.

The whole is controlled by DataStage using either job sequences or hand-crafted job control code, or some amalgam thereof.