Oracle Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
srekant
Premium Member
Premium Member
Posts: 85
Joined: Wed Jan 19, 2005 6:52 am
Location: Detroit

Oracle Stage

Post by srekant »

I am working on DS EE 7.5

I have to read 2 GB of data from a fact table and apply some agrregations and load to target.

Is it better to do aggregations while reading from the oracle table itself using a userdefined query or to read all data in to DataStage and then use an agrregator for doing the same.
Sree
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Both ways will work and both ways will be very easy to build. It comes down to load and bottlenecks. The user-defined SQL method has the benefit of reducing the amount of data to be transferred to the DataStage server, this reduces load on the network and certainly reduces the load on the DS server. It places all the load onto the Oracle database so you need to determine whether this adversely affects end users or other DS jobs.

The DataStage aggregation stage on the other hand is very easy to use, has very good help built in and may be more maintainable then a complex user-defined SQL statement. Especially when it comes to adding an extra aggregation field.

You have a third option, getting your Oracle DBA team to build and maintain a summarised aggregate view of your fact table and selecting from that. This method puts the responsibility for the SQL into the hands of the experts who can tune it.
Post Reply