hi all,
I have a job which has about 500 derivations and it takes about an hour for compilation, after even a small change.The job has a source, target and transformer stage on which 500 derivations are made.
I would want to know how to reduce the compilation time. pls anyone throw some light on this.
Thanks
Compilation Process - time
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The obvious answer is fewer derivations.
Keep in mind it's not just compilation. First, the structure of the job must be sanity-checked, then the expressions in the Transformer stage all have to be converted into C++ equivalents from which source code is generated.
Then that C++ source code must be not only compiled but also linked (that's why there are APT_COMPILE_OPTIONS and APT_LINKER_OPTIONS as environment variables, among others). There is a lot of work that has to be accomplished.
If you are compiling other things at the same time, but only have a single user licence for the compiler, then this will also increase wait times; the second and subsequent requestors for the compiler wait until the first requestor is finished with it.
Keep in mind it's not just compilation. First, the structure of the job must be sanity-checked, then the expressions in the Transformer stage all have to be converted into C++ equivalents from which source code is generated.
Then that C++ source code must be not only compiled but also linked (that's why there are APT_COMPILE_OPTIONS and APT_LINKER_OPTIONS as environment variables, among others). There is a lot of work that has to be accomplished.
If you are compiling other things at the same time, but only have a single user licence for the compiler, then this will also increase wait times; the second and subsequent requestors for the compiler wait until the first requestor is finished with it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 252
- Joined: Mon Sep 19, 2005 10:28 pm
- Location: Melbourne, Australia
- Contact:
An hour does seem a bit excessive, but I haven't tried anything quite so large. Still, you'd think it would scale.
When you say you have 500 derivations, does that mean you have 500 columns in your target, or 500 stage variables.
What is the nature of these derivations? Are they calls to routines? Calls to transforms?
With this sort of delay, my guess is that you have referenced transforms and/or routines, and DS is running off to the repository for each one to verify the parameters and what-not.
I have found that the more routines you have (ie. different ones, not references to the same routine), the worse the repository works. Internally, I suspect the indexing is a bit sub-optimal.
I noticed something interesting in the SDK: all of the routines are shipped with a (redundant) transform of the same name. This could be a work-around for some problem in a previous version, or it could (you'd have to be lucky!) be because transforms compile into code quicker than routines.
You could try isolating the problem by building copies of the job with the same number of target columns but constant derivations. Then start adding complexity, see where it starts to slow down.
When you say you have 500 derivations, does that mean you have 500 columns in your target, or 500 stage variables.
What is the nature of these derivations? Are they calls to routines? Calls to transforms?
With this sort of delay, my guess is that you have referenced transforms and/or routines, and DS is running off to the repository for each one to verify the parameters and what-not.
I have found that the more routines you have (ie. different ones, not references to the same routine), the worse the repository works. Internally, I suspect the indexing is a bit sub-optimal.
I noticed something interesting in the SDK: all of the routines are shipped with a (redundant) transform of the same name. This could be a work-around for some problem in a previous version, or it could (you'd have to be lucky!) be because transforms compile into code quicker than routines.
You could try isolating the problem by building copies of the job with the same number of target columns but constant derivations. Then start adding complexity, see where it starts to slow down.
Ross Leishman
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Transforms are not available in parallel jobs.
Historical Note
The "redundant" Transforms stem from the earliest versions of DataStage, when it was not possible (as in not legal) to call Routines directly from jobs; you had to call them using Transforms as interludes.
Historical Note
The "redundant" Transforms stem from the earliest versions of DataStage, when it was not possible (as in not legal) to call Routines directly from jobs; you had to call them using Transforms as interludes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
yes, i have 500 columns in the transformer stage and deriations for all will be a single "if condtn then dothis else passs null", thats all..
i use these 500 columns in the target also.
For even a small change to the job it takes an hour , but if i do complie wthout any change, it takes few seconds...
is there any way that i can reduce the compliation time?
i use these 500 columns in the target also.
For even a small change to the job it takes an hour , but if i do complie wthout any change, it takes few seconds...
is there any way that i can reduce the compliation time?
-
- Participant
- Posts: 56
- Joined: Mon Oct 16, 2006 7:32 am