DSJobStartDate in Stage Variable calculation

patonp · Post by **patonp** » Thu Dec 22, 2005 7:39 am

We're seeing a very strange situation here...

When we reference the Macro DSJobStartDate in a stage variable calculation, the throughput for our job is 85 rows/sec. However, when we remove the reference to DSJobStartDate from the stage variable, and simply reference DSJobStartDate in a column derivation, then the performance improves to roughly 700 rows/sec. Also, referencing other Macros (i.e. DSJobName) in a stage variable does not cause these performance problems. Any idea why this would be?

Thanks!

Peter

chulett · Post by **chulett** » Thu Dec 22, 2005 7:54 am

No.

A little tip, however. Something like DSJobStartDate which doesn't change over the course of the job only needs to be 'derived' once - not over and over for each row. So what I do is still use it in a Stage Variable but don't put the call in the Derivation, put the call in the Initial Value field when defining the variable. Leave the derivation blank (i.e. it doesn't change) and then simply use it in your job. Works great.

ArndW · Post by **ArndW** » Thu Dec 22, 2005 8:04 am

Peter,

you've found an interesting point. I wrote a test and analyzed it and found out that the DSJobStartDate macro gets converted to an internal variable lookup when used in a constraint or a derivation. But when it is assigned to a stage variable it will call a locally catalogued routine which is only 4 lines long and returns the value from another system location. The overhead for a complete PCL is quite high compared to a variable lookup and this accounts for your speed difference.

What you need to do in the stage variable is to only assign this one time instead of every row (since the Job's start time doesn't change). To do this put the DSJobStartDate call into the initial value section and make the derivation of the stage variable empty. This ensure that it is called once and once only and will bring your performance numbers back up.

chulett · Post by **chulett** » Thu Dec 22, 2005 8:07 am

Sounds like familiar advice.

Interesting find. By the way - 'complete PCL'?

patonp · Post by **patonp** » Thu Dec 22, 2005 8:27 am

Thanks guys.

Great solution - simple and fast!

ArndW · Post by **ArndW** » Thu Dec 22, 2005 9:10 am

Craig - I didn't see your post until after I responded... we did think along the same lines, though.

PCL is short for "Procedure Call" which is a complete call - pushing the current environment onto the stack, doing the call-by-ref/call-by-value substitutions, loading the called procedure into thethe environment and finally popping it off the stack again and returning value(s) to the original procedure.

chulett · Post by **chulett** » Thu Dec 22, 2005 9:49 am

Thanks!

pneumalin · Post by **pneumalin** » Thu Dec 22, 2005 10:14 am

Arnold,
Can you please elaborate on how do you find out the Variables lookup in your testcase, when you have time. Just tries to learn more about that and do it ourself in case we encounter the similar problem in the future.
BTW, Peter and I work on the same issue.
Please accept my deeply appreciation on the prompt response on this from all you guys.
Thanks again!

ArndW · Post by **ArndW** » Thu Dec 22, 2005 12:00 pm

That was quite easy, I wrote a simple job with one transform, but the macro in a stage variable, constraint and in a derivation. I then compiled the job and looked at the generated BASIC code for the transformer to see what the code generator had done.

chulett · Post by **chulett** » Sat Dec 24, 2005 11:00 am

From a private message, answering here so others can (hopefully) benefit:

Hey Craig,

Referring to your suggestion of using DSJobStartDate in stage variable and initial value rather than in Derivation,do you think the same is valid for constants. I am hard coding lot of constants of my target file in the derivation. Is your logic valid for constants too? Should I be declaring all my constants in stage variables as Its not changing. Please comment on this situation. Performance is very important for my job.

The advice applies to unnecessary repetition of derivations. In the specific case referenced in this thread, the call to DSJobStartDate only needed to be called once and then referenced on each row, rather than called on each row.

If by 'constants' in Derivations and Constraints you literally mean a constant value, then no I don't see a burning need to load them into a Stage Variable - except for two reasons. One is you want to take advantage of the 'self documentation' that properly named stage variables bring to the table. The name you use may be more meaningful to future developers than a hard-coded value. The other reason would be, if you use this same 'hard coded' value in several places, then consolidating it into one place - a stage variable - and referencing it in all those other places, makes it simple to update should that value need to change.

Then there is the situation where you have a complex derivation, one that may be used without change in several derivations or constraints. Evaluating it once in the Stage Variable and setting a boolean value there lets you use that same stage variable in the various derivations or constraints without fear of one not exactly matching the other. It also means the complex expression is only evaluated once, is named something that is easier to understand than the derivation itself and is easily modified in all the appropriate places at once.

Note that I will do this even if the complex derivation is only used once simply to make the job easier to read. For example, a stage variable called 'svPriorityAd' as a boolean (true/false) value is easier to understand - "If svPriorityAd Then X Else Y" in a derivation than the possibly four mile long computation that goes into determining that fact. If someone is really concerned about how that variable is derived, they can dig into it up there.

Hope that helps.