Page 1 of 1

Difference/Advantage of using Transform instead of Routine

Posted: Thu Jul 01, 2004 2:06 pm
by willpeng
Can anyone enlighten me on why would I want to use a DS transform instead of DS routine? Since I am calling the rountine from the transform as well?

My limited understanding is that using it as just rountine will degrade performance, but anyone can tell me why?

Willy

Posted: Thu Jul 01, 2004 3:04 pm
by chulett
Don't really have time for the Full Wurlod, but in a nutshell...

A routine is called by the job when it runs, so the disadvantage is the overhead of the context switching and the passing of the arguments back and forth. This can tend to become significant when dealing with large data volumes. The advantage is any changes made to a routine are automatically picked up by any job that uses them the next time they run.

A transform is a piece of code that is substituted in by the compiler at compile time, so there is no 'overhead' associated with it. The downside is you have to recompile any jobs that use those transforms before they would see the change.

Posted: Thu Jul 01, 2004 4:04 pm
by ray.wurlod
There's no context switching when calling a routine; the only overhead is determining its location (via the Catalog, or VOC file) and loading it into memory (this process is sometimes called "link snapping"); after that, the in-memory location is cached.
As part of loading the routine, stack entries must be constructed for normal and error return, as well as for argument passing. Later, these stack entries must be deallocated.
Even so, to comparison is between some overhead and none.

A Transform is a single expression, stored with associated documentation (and references to data elements, if you use these) in the Repository. When a job that uses a Transform is compiled, the Transform's defining expression is copied into the job as in-line code.
Some Transforms call Routines; in this case that advantage is lost. The advantage now is, for example, having something meaningfully named that a developer can use (such as DIGITS) rather than requiring the developer to learn the arcane underlying function.

There is a limit to what can be done in a single expression, and some things (such as searching a dynamic array) simply cannot be done in expressions, because they require statements. In this case, you are forced into using a Routine, since it's the only way to achieve the task.

Routines that are transform functions are called for every row processed, so should be as lightweight and efficient as possible. Before/after subroutines, on the other hand, are executed only once per job run, so can be quite heavy duty.

Posted: Thu Jul 01, 2004 4:57 pm
by chulett
ray.wurlod wrote:There's no context switching when calling a routine; the only overhead is determining its location (via the Catalog, or VOC file) and loading it into memory (this process is sometimes called "link snapping"); after that, the in-memory location is cached.
Ah... thanks for the clarification. It was explained to me once that way, and I've been merrily passing it along ever since. :? Silly Wabbit.

Posted: Thu Jul 01, 2004 5:42 pm
by kduke
Ah the Full Wurlod. What would life be without a Full Wurlod once in a while.

Posted: Thu Jul 01, 2004 7:12 pm
by chulett
Hear Hear! :lol:

Posted: Fri Jul 02, 2004 8:59 am
by willpeng
Thanks!!! It helps.

So I guess trying to make routine into transform does not increase performance huh?

So is there a joke here that I didn't get???

Posted: Fri Jul 02, 2004 10:55 am
by willpeng
What about a Stage Variable? Any advantage in using it instead of Transform or Routine? It looks and smells like a transform specific for the job.

Posted: Fri Jul 02, 2004 11:14 am
by chulett
Stage Variables are very handy and can be used to simplify things. They are evaluated (in order) before the derivations in your output links, so they can be used to cut down on the amount of work done in the Transformer.

For example, a complex derivation that is used in multiple output links can be put in a Stage Variable, evaluated once and then simply referenced in each output link.

They can also make constraints and other derivations easier to understand when setup as boolean values. Another 'for example', setup one called 'NewRecord' using the derivation needed to determine if a record is new and set its value to TRUE or FALSE. Then simply refer to it later - "If NewRecord Then ... Else ...".

They are also about the only way, when working with repeating groups, to capture previous values and then compare them to current values in a Server job. Well, there is COMMON storage but a Stage Variable is a better answer nowadays.

Re: Difference/Advantage of using Transform instead of Routi

Posted: Fri Jul 02, 2004 3:22 pm
by alexysflores
[quote="willpeng"]Can anyone enlighten me on why would I want to use a DS transform instead of DS routine? Since I am calling the rountine from the transform as well?

My limited understanding is that using it as just rountine will degrade performance, but anyone can tell me why?

Willy[/quote]

I would advise you not to use both DS Transform and Routine if you have over million rows of transaction there are degradation in performance. Coz its still BASIC - toooooo sloooooow

Posted: Fri Jul 02, 2004 4:39 pm
by ray.wurlod
What's your alternative in server jobs? :?

I challenge you to do anything faster in server jobs that what can be done with BASIC expressions/routines.

Note that I didn't specify "what you can do with BASIC" - I specified "what can be done with BASIC".

Posted: Fri Jul 02, 2004 5:34 pm
by willpeng
If not transform or routine for an over million row in DS, what else?

Is there something option or way to able to call a rountine or function without it being called and cleaned up for each row?

I can probably code a routine that actually process each row within that routinue, but that defeats the purpose of the rapid development and GUI in DS.

Posted: Sun Jul 04, 2004 8:17 am
by jwhyman
You use a routine for logic that cannot be represented in a transform. A transfrorm is an expression, whereas routines can contain statements. I am surprised that you say that Basic is so slow, some of the functionality beneath statemnts is very complex. It is yes slower than maybe taking the time to re write your custoem logic in C/C++. You can do this in EE (PX). You can actually do it in Server using DSCAPI ad write your own stage. If you are replicating existing functionliy, it will be , most likely, slower.

A million rows is not large, and not many people run their jobs on toaters thee days, performnce is about perception, expectation and need.