Routine or derivation

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sumesh.abraham
Participant
Posts: 153
Joined: Thu May 11, 2006 1:52 am
Location: Bangalore

Routine or derivation

Post by sumesh.abraham »

Hello all,

I am analyzing a set of Data Stage server jobs for performance enhancement. The job reads a source file (Aproximately 338420 records), does 2-3 lookups and then writes to 2 Oracle tables (DRS stage is used).

It was noted that one of the jobs calls a routine for each record (Aproximately 338420 records) to write the current timestamp to output link .

The routine code is nothing but
OConv(Date(),"D-YMD[4,2,2]") : " " : OConv(Time(),"MTS")

Currently the job takes 40 minutes to complete and the bottleneck happens between the transformer where the above transformation happens and the inserts that happen to the tables. If I avoid the routine call and instead provide the derivation above for the column, will I get any significant improve in the job performance?

Also the current Array Size and transaction size are 1 and 0 respectively. If I raise them to 15000, can the job achieve a better performance?

Your inputs are highly appreciated.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: Routine or derivation

Post by chulett »

sumesh.abraham wrote:If I avoid the routine call and instead provide the derivation above for the column, will I get any significant improve in the job performance?
None whatsoever. You'd be better off letting Oracle handle the timestamp if it truly needed to be "real time". Better to capture the "job start timestamp" in the initial value of a stage variable and then leverage that, again unless you need the timestamp's value to change over the run of the job.
sumesh.abraham also wrote:Also the current Array Size and transaction size are 1 and 0 respectively. If I raise them to 15000, can the job achieve a better performance?
I wouldn't touch the Transaction Size as zero should have been used for a reason. Array Size, however, set to one would indeed be slow, possibly slower than it needs to be. You could try raising it (15000 would be extreme IMHO) but there are times when 1 is used purposefully, hard to say one way or the other based on what information we have. What kind of "writes" are happening in the target? Any large fields involved, like perhaps a CLOB? Are the lookups against hashed files? Are they cached? Properly sized?
-craig

"You can never have too many knives" -- Logan Nine Fingers
sumesh.abraham
Participant
Posts: 153
Joined: Thu May 11, 2006 1:52 am
Location: Bangalore

Post by sumesh.abraham »

What kind of "writes" are happening in the target? Any large fields involved, like perhaps a CLOB? Are the lookups against hashed files? Are they cached? Properly sized?
There are no large fields involved, like CLOB. Currently one lookup is done in such a way that a hashed file is created from a table (in the same job) and then looked up. For other 2 lookups, no hashed files are used. Instead direct lookup against Oracle tables are done.

Hashed file does not have 'Allow stage write cache' enabled. Group size and minimum modulus are 1.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Was referring to read cache, not write cache and with a minimum modulus of 1 that means there's been no attempt to optimize it. If the "bottleneck" is the transformer then it's your lookups, do whatever you need to do to speed them up. Why are direct database lookups being used rather than hashed files?

You still haven't mentioned your target writes - inserts only, upserts, what is going on there? Have you tried replacing the Oracle target with a sequential file to see what changes? That's all part of the trouble-shooting, all part of determining the true bottleneck.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sumesh.abraham
Participant
Posts: 153
Joined: Thu May 11, 2006 1:52 am
Location: Bangalore

Post by sumesh.abraham »

Thanks.

Is there a way to define Read cache at job level than at Project level?
We are only doing Inserts using the DRS stage (Insert without clearing).
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

No. Hashed file cache size is a project property (or a global property if you use public caching).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply