Page 1 of 1

Compilation Process - time

Posted: Wed Dec 27, 2006 12:36 am
by vij
hi all,

I have a job which has about 500 derivations and it takes about an hour for compilation, after even a small change.The job has a source, target and transformer stage on which 500 derivations are made.

I would want to know how to reduce the compilation time. pls anyone throw some light on this.

Thanks

Posted: Wed Dec 27, 2006 12:52 am
by ray.wurlod
The obvious answer is fewer derivations.

Keep in mind it's not just compilation. First, the structure of the job must be sanity-checked, then the expressions in the Transformer stage all have to be converted into C++ equivalents from which source code is generated.

Then that C++ source code must be not only compiled but also linked (that's why there are APT_COMPILE_OPTIONS and APT_LINKER_OPTIONS as environment variables, among others). There is a lot of work that has to be accomplished.

If you are compiling other things at the same time, but only have a single user licence for the compiler, then this will also increase wait times; the second and subsequent requestors for the compiler wait until the first requestor is finished with it.

Posted: Wed Dec 27, 2006 1:00 am
by rleishman
An hour does seem a bit excessive, but I haven't tried anything quite so large. Still, you'd think it would scale.

When you say you have 500 derivations, does that mean you have 500 columns in your target, or 500 stage variables.

What is the nature of these derivations? Are they calls to routines? Calls to transforms?

With this sort of delay, my guess is that you have referenced transforms and/or routines, and DS is running off to the repository for each one to verify the parameters and what-not.

I have found that the more routines you have (ie. different ones, not references to the same routine), the worse the repository works. Internally, I suspect the indexing is a bit sub-optimal.

I noticed something interesting in the SDK: all of the routines are shipped with a (redundant) transform of the same name. This could be a work-around for some problem in a previous version, or it could (you'd have to be lucky!) be because transforms compile into code quicker than routines.

You could try isolating the problem by building copies of the job with the same number of target columns but constant derivations. Then start adding complexity, see where it starts to slow down.

Posted: Wed Dec 27, 2006 1:14 am
by ray.wurlod
Transforms are not available in parallel jobs.

Historical Note
The "redundant" Transforms stem from the earliest versions of DataStage, when it was not possible (as in not legal) to call Routines directly from jobs; you had to call them using Transforms as interludes.

Posted: Wed Dec 27, 2006 4:10 am
by vij
yes, i have 500 columns in the transformer stage and deriations for all will be a single "if condtn then dothis else passs null", thats all..
i use these 500 columns in the target also.

For even a small change to the job it takes an hour , but if i do complie wthout any change, it takes few seconds...

is there any way that i can reduce the compliation time?

Posted: Wed Dec 27, 2006 5:50 am
by johnthomas
Is the derivation checking for the same condition always ?? . in that case You may try switch and then funnel stage .