Better Approch for Extraction

yuva010 · Post by **yuva010** » Mon May 05, 2008 6:02 pm

Hi,

For Extraction, I have two source systems - on different databases. I know joins between them. I have two options -
Using Source Database to Flat files and then feed to Datastage; If I go with this approch can I join two datasource files, using one job?
Using direct Source databases as source in Datastage;

I am new to datastage, I have worked in Informatica and there it really doesn't matter.
I want to know how it will have impact if I am using Datastage?

ray.wurlod · Post by **ray.wurlod** » Mon May 05, 2008 7:28 pm

It really doesn't matter in DataStage either. The advantage of the flat file approach (in either tool) is that you perform the extraction only once, and restart from a given point is easier because you have the staging area (the flat files). The advantage of the "directly into tool" approach is that data do not touch down on disk and therefore should be processed faster. Which of these is more important in your particular situation?

Minhajuddin · Post by **Minhajuddin** » Mon May 05, 2008 8:21 pm

If I were you, I would definitely dump data into flat files, and then do the joining in a different job. One of the reasons as Ray pointed out is that you don't have to extract the whole data again, if something goes wrong. The second is that: you can make the extraction job run on more nodes compared to the join job (As DB operations take comparatively more than a job which works on local datasets)

yuva010 · Post by **yuva010** » Mon May 05, 2008 8:30 pm

I agree with you both Minh and Ray,

As two source systems are different,

In Direct database Extract, we can prepare the stage in between having source at one layer and then use it for ETL.

In case of files, Can I bring them on one unix box and use them as one source? So that it will work as a stage for me?

ray.wurlod · Post by **ray.wurlod** » Mon May 05, 2008 9:58 pm

yuva010 wrote:In case of files, Can I bring them on one unix box and use them as one source? So that it will work as a stage for me?

Definitely. Your database connections from DataStage specify locations for the database servers, and your database client software (on the DataStage server machine) look after the rest. That's all there is to it!

(It can be even easier once you get to version 8.0.)

Minhajuddin · Post by **Minhajuddin** » Tue May 06, 2008 9:01 am

Yes.

You should use Datasets for better performance.