Routine or derivation

sumesh.abraham · Post by **sumesh.abraham** » Sun May 29, 2011 10:31 pm

Hello all,

I am analyzing a set of Data Stage server jobs for performance enhancement. The job reads a source file (Aproximately 338420 records), does 2-3 lookups and then writes to 2 Oracle tables (DRS stage is used).

It was noted that one of the jobs calls a routine for each record (Aproximately 338420 records) to write the current timestamp to output link .

The routine code is nothing but
OConv(Date(),"D-YMD[4,2,2]") : " " : OConv(Time(),"MTS")

Currently the job takes 40 minutes to complete and the bottleneck happens between the transformer where the above transformation happens and the inserts that happen to the tables. If I avoid the routine call and instead provide the derivation above for the column, will I get any significant improve in the job performance?

Also the current Array Size and transaction size are 1 and 0 respectively. If I raise them to 15000, can the job achieve a better performance?

Your inputs are highly appreciated.

chulett · Post by **chulett** » Sun May 29, 2011 10:56 pm

sumesh.abraham wrote:If I avoid the routine call and instead provide the derivation above for the column, will I get any significant improve in the job performance?

None whatsoever. You'd be better off letting Oracle handle the timestamp if it truly needed to be "real time". Better to capture the "job start timestamp" in the initial value of a stage variable and then leverage that, again unless you need the timestamp's value to change over the run of the job.

sumesh.abraham also wrote:Also the current Array Size and transaction size are 1 and 0 respectively. If I raise them to 15000, can the job achieve a better performance?

I wouldn't touch the Transaction Size as zero should have been used for a reason. Array Size, however, set to one would indeed be slow, possibly slower than it needs to be. You could try raising it (15000 would be extreme IMHO) but there are times when 1 is used purposefully, hard to say one way or the other based on what information we have. What kind of "writes" are happening in the target? Any large fields involved, like perhaps a CLOB? Are the lookups against hashed files? Are they cached? Properly sized?

sumesh.abraham · Post by **sumesh.abraham** » Sun May 29, 2011 11:08 pm

What kind of "writes" are happening in the target? Any large fields involved, like perhaps a CLOB? Are the lookups against hashed files? Are they cached? Properly sized?

There are no large fields involved, like CLOB. Currently one lookup is done in such a way that a hashed file is created from a table (in the same job) and then looked up. For other 2 lookups, no hashed files are used. Instead direct lookup against Oracle tables are done.

Hashed file does not have 'Allow stage write cache' enabled. Group size and minimum modulus are 1.

chulett · Post by **chulett** » Mon May 30, 2011 9:00 am

Was referring to read cache, not write cache and with a minimum modulus of 1 that means there's been no attempt to optimize it. If the "bottleneck" is the transformer then it's your lookups, do whatever you need to do to speed them up. Why are direct database lookups being used rather than hashed files?

You still haven't mentioned your target writes - inserts only, upserts, what is going on there? Have you tried replacing the Oracle target with a sequential file to see what changes? That's all part of the trouble-shooting, all part of determining the true bottleneck.

sumesh.abraham · Post by **sumesh.abraham** » Tue May 31, 2011 5:20 pm

Thanks.

Is there a way to define Read cache at job level than at Project level?
We are only doing Inserts using the DRS stage (Insert without clearing).

ray.wurlod · Post by **ray.wurlod** » Tue May 31, 2011 6:31 pm

No. Hashed file cache size is a project property (or a global property if you use public caching).

DSXchange

Routine or derivation

Routine or derivation

Re: Routine or derivation