Difference/Advantage of using Transform instead of Routine
Moderators: chulett, rschirm, roy
Difference/Advantage of using Transform instead of Routine
Can anyone enlighten me on why would I want to use a DS transform instead of DS routine? Since I am calling the rountine from the transform as well?
My limited understanding is that using it as just rountine will degrade performance, but anyone can tell me why?
Willy
My limited understanding is that using it as just rountine will degrade performance, but anyone can tell me why?
Willy
Don't really have time for the Full Wurlod, but in a nutshell...
A routine is called by the job when it runs, so the disadvantage is the overhead of the context switching and the passing of the arguments back and forth. This can tend to become significant when dealing with large data volumes. The advantage is any changes made to a routine are automatically picked up by any job that uses them the next time they run.
A transform is a piece of code that is substituted in by the compiler at compile time, so there is no 'overhead' associated with it. The downside is you have to recompile any jobs that use those transforms before they would see the change.
A routine is called by the job when it runs, so the disadvantage is the overhead of the context switching and the passing of the arguments back and forth. This can tend to become significant when dealing with large data volumes. The advantage is any changes made to a routine are automatically picked up by any job that uses them the next time they run.
A transform is a piece of code that is substituted in by the compiler at compile time, so there is no 'overhead' associated with it. The downside is you have to recompile any jobs that use those transforms before they would see the change.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
There's no context switching when calling a routine; the only overhead is determining its location (via the Catalog, or VOC file) and loading it into memory (this process is sometimes called "link snapping"); after that, the in-memory location is cached.
As part of loading the routine, stack entries must be constructed for normal and error return, as well as for argument passing. Later, these stack entries must be deallocated.
Even so, to comparison is between some overhead and none.
A Transform is a single expression, stored with associated documentation (and references to data elements, if you use these) in the Repository. When a job that uses a Transform is compiled, the Transform's defining expression is copied into the job as in-line code.
Some Transforms call Routines; in this case that advantage is lost. The advantage now is, for example, having something meaningfully named that a developer can use (such as DIGITS) rather than requiring the developer to learn the arcane underlying function.
There is a limit to what can be done in a single expression, and some things (such as searching a dynamic array) simply cannot be done in expressions, because they require statements. In this case, you are forced into using a Routine, since it's the only way to achieve the task.
Routines that are transform functions are called for every row processed, so should be as lightweight and efficient as possible. Before/after subroutines, on the other hand, are executed only once per job run, so can be quite heavy duty.
As part of loading the routine, stack entries must be constructed for normal and error return, as well as for argument passing. Later, these stack entries must be deallocated.
Even so, to comparison is between some overhead and none.
A Transform is a single expression, stored with associated documentation (and references to data elements, if you use these) in the Repository. When a job that uses a Transform is compiled, the Transform's defining expression is copied into the job as in-line code.
Some Transforms call Routines; in this case that advantage is lost. The advantage now is, for example, having something meaningfully named that a developer can use (such as DIGITS) rather than requiring the developer to learn the arcane underlying function.
There is a limit to what can be done in a single expression, and some things (such as searching a dynamic array) simply cannot be done in expressions, because they require statements. In this case, you are forced into using a Routine, since it's the only way to achieve the task.
Routines that are transform functions are called for every row processed, so should be as lightweight and efficient as possible. Before/after subroutines, on the other hand, are executed only once per job run, so can be quite heavy duty.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ah... thanks for the clarification. It was explained to me once that way, and I've been merrily passing it along ever since. Silly Wabbit.ray.wurlod wrote:There's no context switching when calling a routine; the only overhead is determining its location (via the Catalog, or VOC file) and loading it into memory (this process is sometimes called "link snapping"); after that, the in-memory location is cached.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Stage Variables are very handy and can be used to simplify things. They are evaluated (in order) before the derivations in your output links, so they can be used to cut down on the amount of work done in the Transformer.
For example, a complex derivation that is used in multiple output links can be put in a Stage Variable, evaluated once and then simply referenced in each output link.
They can also make constraints and other derivations easier to understand when setup as boolean values. Another 'for example', setup one called 'NewRecord' using the derivation needed to determine if a record is new and set its value to TRUE or FALSE. Then simply refer to it later - "If NewRecord Then ... Else ...".
They are also about the only way, when working with repeating groups, to capture previous values and then compare them to current values in a Server job. Well, there is COMMON storage but a Stage Variable is a better answer nowadays.
For example, a complex derivation that is used in multiple output links can be put in a Stage Variable, evaluated once and then simply referenced in each output link.
They can also make constraints and other derivations easier to understand when setup as boolean values. Another 'for example', setup one called 'NewRecord' using the derivation needed to determine if a record is new and set its value to TRUE or FALSE. Then simply refer to it later - "If NewRecord Then ... Else ...".
They are also about the only way, when working with repeating groups, to capture previous values and then compare them to current values in a Server job. Well, there is COMMON storage but a Stage Variable is a better answer nowadays.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 18
- Joined: Mon Jan 12, 2004 7:20 am
- Location: USA
Re: Difference/Advantage of using Transform instead of Routi
[quote="willpeng"]Can anyone enlighten me on why would I want to use a DS transform instead of DS routine? Since I am calling the rountine from the transform as well?
My limited understanding is that using it as just rountine will degrade performance, but anyone can tell me why?
Willy[/quote]
I would advise you not to use both DS Transform and Routine if you have over million rows of transaction there are degradation in performance. Coz its still BASIC - toooooo sloooooow
My limited understanding is that using it as just rountine will degrade performance, but anyone can tell me why?
Willy[/quote]
I would advise you not to use both DS Transform and Routine if you have over million rows of transaction there are degradation in performance. Coz its still BASIC - toooooo sloooooow
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
What's your alternative in server jobs?
I challenge you to do anything faster in server jobs that what can be done with BASIC expressions/routines.
Note that I didn't specify "what you can do with BASIC" - I specified "what can be done with BASIC".
I challenge you to do anything faster in server jobs that what can be done with BASIC expressions/routines.
Note that I didn't specify "what you can do with BASIC" - I specified "what can be done with BASIC".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
If not transform or routine for an over million row in DS, what else?
Is there something option or way to able to call a rountine or function without it being called and cleaned up for each row?
I can probably code a routine that actually process each row within that routinue, but that defeats the purpose of the rapid development and GUI in DS.
Is there something option or way to able to call a rountine or function without it being called and cleaned up for each row?
I can probably code a routine that actually process each row within that routinue, but that defeats the purpose of the rapid development and GUI in DS.
William Peng
DW/ETL Consultant
Middletown, NJ
DW/ETL Consultant
Middletown, NJ
You use a routine for logic that cannot be represented in a transform. A transfrorm is an expression, whereas routines can contain statements. I am surprised that you say that Basic is so slow, some of the functionality beneath statemnts is very complex. It is yes slower than maybe taking the time to re write your custoem logic in C/C++. You can do this in EE (PX). You can actually do it in Server using DSCAPI ad write your own stage. If you are replicating existing functionliy, it will be , most likely, slower.
A million rows is not large, and not many people run their jobs on toaters thee days, performnce is about perception, expectation and need.
A million rows is not large, and not many people run their jobs on toaters thee days, performnce is about perception, expectation and need.