Page 1 of 1

Modify Stage

Posted: Thu Nov 30, 2006 5:19 am
by ajith
I got a document on DS EE best practices, It says about modify stage

"3.3.7 Modify stage

After DataStage release 7.5.1, Transformer stage performs better than Modify stage even for simple null handling operations. Moreover Modify stage breaks the metadata link between the stages. So it is not recommended to use Modify stage in the jobs."


Is this true?

is it worse compared to a transformer or is this just another made up argument?

I am shocked :?

Posted: Thu Nov 30, 2006 5:45 am
by Nageshsunkoji
Hi Ajith,

I am not pretty sure about your satatement about modify stage. In which document you have read about this performance tips ? even I have some performance documents, they are saying that Modify stage is the more useful stage in the DS Parallel. Until, I haven't find any drgastic performance degrade with modify stage, while handling nulls.

Posted: Thu Nov 30, 2006 6:06 am
by ajith
It was a document compiled by one person in my organization. I am skeptical about this argument anyway.

I want to make sure because, if this is valid, a lot of jobs needs has to be modified.

Thanks for ur valuable inputs Nagesh

Posted: Thu Nov 30, 2006 6:28 am
by BalageBaju
Ajith,

As far as i know Modify Stage gives more performance than the Transformer Stage. Also we are mostly using modify stage in our jobs instead of Transformer (wherever it is possible), and we are trying to avoiding Transformer stage bcoz of performance.

Posted: Thu Nov 30, 2006 6:50 am
by balajisr
It is true that transformer is more efficient in 7.5.1 than in previous versions but modify stage should be at least as efficient as transformer if not more.

Posted: Thu Nov 30, 2006 1:56 pm
by ray.wurlod
There are far too many unsupported assertions in that document, which - if it is or is based upon the one of which I'm thinking - you should not have (it's IBM Internal Confidential, produced by the Center of Excellence for use by IBM consultants).

While it is true to claim that performance improvements have been made in the Transformer stage, it remains true that the very primitive Modify stage is very efficient precisely because it is primitive. Indeed, if you inspect the code generated when a Transformer stage is compiled, you are very likely to see modify operators used in that code!

Find the author, demand objective proof!

Posted: Thu Nov 30, 2006 5:32 pm
by vmcburney
I talked about this in my blog entry Is the DataStage parallel transformer evil? and my approach is to always go with the Transformer first since it is the easiest stage to use and the most user-friendly.

The Modify stage can be plain nasty. It's okay if you are just doing trimming but if you need to perform more than one function on a field forget about it, and if you haven't used it before and you need to do several types of functions you could spend hours getting the syntax right. The Transformer on the other hand helps you with the syntax with the right click menu and syntax checking.

I would only use the Modify stage if I needed to eke some extra performance out of a job, so I would add it after I had completed my job design and discovered in performance testing that it was too slow. Even then I wouldn't be surprised to get a 2% performance improvement.

Posted: Fri Dec 01, 2006 5:26 am
by rameshrr3
What i encountered was that errors generated while using functions in modify stage were far more difficult to correct and took many iterations.I almost gave up on PX when confronted with modify stage :( . On the theoretical side , im however led to believe equivalent code/transformation done by modify stage will be faster than a PX Transformer stage. I admit that im still confused when confronted with something that could be done by either stage.

Posted: Fri Dec 01, 2006 6:07 am
by balajisr
It would be great if IBM could develop expression editor similar to that of transformer for writing modify specification in the modify stage. This would solve problems to a certain extent.

Re: Modify Stage

Posted: Fri Dec 01, 2006 6:20 am
by khaja.arshad
I hope most of the DS users have the same frustrating experience with modify stage

IBM should atleast come up with some documentation for the Modify stage

Posted: Fri Dec 01, 2006 2:46 pm
by ray.wurlod
I took some time to play with it, to learn its idiosyncracies. Its very value is in how primitive an operator it is. It IS worth learning for all those little things (null handling, column name change, data type change) that you often have to do to get downstream stages to work properly.

Posted: Thu Jul 14, 2011 10:54 am
by Jboyd
So there is nothing within the Modify stage you can do to preserve the metadata?

Posted: Thu Jul 14, 2011 11:48 am
by jwiles
There is documentation for the Modify Stage in the Parallel Job Developer's Guide for your version (assuming you are using at least v7.5.1 or above). In the 7.5.1 doc, it's in Chapter 28 and includes most if not all available functions and the proper syntax.

To "preserve" your metadata in the visual sense (in that it's displayed on your output column links), make use of table definitions and load them and/or manually add columns to the metadata grid. There is no mapping tab as in other stages.

Internally, the operator itself will generate the proper output metadata to be shared with the next operator downstream. You can see this by adding the $OSH_PRINT_SCHEMAS environment variable.

Regards,