how different job configurations affect job speed

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Views vs User Defined SQL
The view is materialized (the SELECT statement on which it is based is executed) when the view is accessed in a SELECT statement. The difference is negligible and probably indetectable.

Table Joins vs Multiple Reference Inputs
If you have performed the join during the SELECT process (extraction) DataStage will have to process less data. So job duration - all else being equal - will be reduced. On the other hand, if reference lookups need to be performed against the same hashed files in multiple jobs, sharing the disk cache may provide benefit. There's a technical bulletin on this among the DataStage documentation installed with your client software.

Hashed Files vs Oracle Lookups
If Oracle is on a separate machine, the transition occurs immediately. Never perform reference lookups over a network!
As a general rule, the hashed file (particularly if cached in memory) performs faster on a key-based search (which is what a reference lookup is). But there is a small overhead (typically only a few seconds) in loading the hashed file. With a small number of rows being processed (number of reference lookups being performed) the cost of this pre-load may outweigh the benefit of using hashed file. Exactly where the cut-off occurs will depend on a number of factors, including how big a row is, so there's no definitive answer. But it will be a small number.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Agree with Ray all his answers. I've got a some additional thoughts on when to use database views:
- if it is to be used as a source by multiple jobs. You face greater risk if you are trying to maintain the same custom SQL in multiple jobs.
- your query uses complex SQL and you want it managed within the database and not in the ETL tool. The view can be built and maintained by a database expert who does not need any exposure to DataStage.
- views are good for database security, you remove access to the base tables and supply read only access to views for the ETL login. You can then implement both column and row based security in your views.


Vincent McBurney
Data Integration Services
www.intramatix.com
Post Reply