Modify Stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ajith
Participant
Posts: 86
Joined: Thu Nov 10, 2005 11:10 pm

Modify Stage

Post by ajith »

I got a document on DS EE best practices, It says about modify stage

"3.3.7 Modify stage

After DataStage release 7.5.1, Transformer stage performs better than Modify stage even for simple null handling operations. Moreover Modify stage breaks the metadata link between the stages. So it is not recommended to use Modify stage in the jobs."


Is this true?

is it worse compared to a transformer or is this just another made up argument?

I am shocked :?
Nageshsunkoji
Participant
Posts: 222
Joined: Tue Aug 30, 2005 2:07 am
Location: pune
Contact:

Post by Nageshsunkoji »

Hi Ajith,

I am not pretty sure about your satatement about modify stage. In which document you have read about this performance tips ? even I have some performance documents, they are saying that Modify stage is the more useful stage in the DS Parallel. Until, I haven't find any drgastic performance degrade with modify stage, while handling nulls.
NageshSunkoji

If you know anything SHARE it.............
If you Don't know anything LEARN it...............
ajith
Participant
Posts: 86
Joined: Thu Nov 10, 2005 11:10 pm

Post by ajith »

It was a document compiled by one person in my organization. I am skeptical about this argument anyway.

I want to make sure because, if this is valid, a lot of jobs needs has to be modified.

Thanks for ur valuable inputs Nagesh
BalageBaju
Participant
Posts: 34
Joined: Fri Sep 22, 2006 10:59 pm
Location: India

Post by BalageBaju »

Ajith,

As far as i know Modify Stage gives more performance than the Transformer Stage. Also we are mostly using modify stage in our jobs instead of Transformer (wherever it is possible), and we are trying to avoiding Transformer stage bcoz of performance.
Regards,
Balaji.
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

It is true that transformer is more efficient in 7.5.1 than in previous versions but modify stage should be at least as efficient as transformer if not more.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There are far too many unsupported assertions in that document, which - if it is or is based upon the one of which I'm thinking - you should not have (it's IBM Internal Confidential, produced by the Center of Excellence for use by IBM consultants).

While it is true to claim that performance improvements have been made in the Transformer stage, it remains true that the very primitive Modify stage is very efficient precisely because it is primitive. Indeed, if you inspect the code generated when a Transformer stage is compiled, you are very likely to see modify operators used in that code!

Find the author, demand objective proof!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I talked about this in my blog entry Is the DataStage parallel transformer evil? and my approach is to always go with the Transformer first since it is the easiest stage to use and the most user-friendly.

The Modify stage can be plain nasty. It's okay if you are just doing trimming but if you need to perform more than one function on a field forget about it, and if you haven't used it before and you need to do several types of functions you could spend hours getting the syntax right. The Transformer on the other hand helps you with the syntax with the right click menu and syntax checking.

I would only use the Modify stage if I needed to eke some extra performance out of a job, so I would add it after I had completed my job design and discovered in performance testing that it was too slow. Even then I wouldn't be surprised to get a 2% performance improvement.
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

What i encountered was that errors generated while using functions in modify stage were far more difficult to correct and took many iterations.I almost gave up on PX when confronted with modify stage :( . On the theoretical side , im however led to believe equivalent code/transformation done by modify stage will be faster than a PX Transformer stage. I admit that im still confused when confronted with something that could be done by either stage.
balajisr
Charter Member
Charter Member
Posts: 785
Joined: Thu Jul 28, 2005 8:58 am

Post by balajisr »

It would be great if IBM could develop expression editor similar to that of transformer for writing modify specification in the modify stage. This would solve problems to a certain extent.
khaja.arshad
Participant
Posts: 30
Joined: Mon May 29, 2006 11:19 pm

Re: Modify Stage

Post by khaja.arshad »

I hope most of the DS users have the same frustrating experience with modify stage

IBM should atleast come up with some documentation for the Modify stage
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I took some time to play with it, to learn its idiosyncracies. Its very value is in how primitive an operator it is. It IS worth learning for all those little things (null handling, column name change, data type change) that you often have to do to get downstream stages to work properly.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Jboyd
Participant
Posts: 15
Joined: Mon Mar 14, 2011 12:55 pm

Post by Jboyd »

So there is nothing within the Modify stage you can do to preserve the metadata?
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

There is documentation for the Modify Stage in the Parallel Job Developer's Guide for your version (assuming you are using at least v7.5.1 or above). In the 7.5.1 doc, it's in Chapter 28 and includes most if not all available functions and the proper syntax.

To "preserve" your metadata in the visual sense (in that it's displayed on your output column links), make use of table definitions and load them and/or manually add columns to the metadata grid. There is no mapping tab as in other stages.

Internally, the operator itself will generate the proper output metadata to be shared with the next operator downstream. You can see this by adding the $OSH_PRINT_SCHEMAS environment variable.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
Post Reply