Better Approch for Extraction

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
yuva010
Participant
Posts: 36
Joined: Thu Apr 24, 2008 7:12 pm

Better Approch for Extraction

Post by yuva010 »

Hi,

For Extraction, I have two source systems - on different databases. I know joins between them. I have two options -
Using Source Database to Flat files and then feed to Datastage; If I go with this approch can I join two datasource files, using one job?
Using direct Source databases as source in Datastage;

I am new to datastage, I have worked in Informatica and there it really doesn't matter.
I want to know how it will have impact if I am using Datastage?
Thanks,
Yuva.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It really doesn't matter in DataStage either. The advantage of the flat file approach (in either tool) is that you perform the extraction only once, and restart from a given point is easier because you have the staging area (the flat files). The advantage of the "directly into tool" approach is that data do not touch down on disk and therefore should be processed faster. Which of these is more important in your particular situation?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

If I were you, I would definitely dump data into flat files, and then do the joining in a different job. One of the reasons as Ray pointed out is that you don't have to extract the whole data again, if something goes wrong. The second is that: you can make the extraction job run on more nodes compared to the join job (As DB operations take comparatively more than a job which works on local datasets)
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
yuva010
Participant
Posts: 36
Joined: Thu Apr 24, 2008 7:12 pm

Post by yuva010 »

I agree with you both Minh and Ray,

As two source systems are different,

In Direct database Extract, we can prepare the stage in between having source at one layer and then use it for ETL.

In case of files, Can I bring them on one unix box and use them as one source? So that it will work as a stage for me?
Thanks,
Yuva.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

yuva010 wrote:In case of files, Can I bring them on one unix box and use them as one source? So that it will work as a stage for me?
Definitely. Your database connections from DataStage specify locations for the database servers, and your database client software (on the DataStage server machine) look after the rest. That's all there is to it!

(It can be even easier once you get to version 8.0.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

Yes.

You should use Datasets for better performance.
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
Post Reply