Modify Stage
Moderators: chulett, rschirm, roy
Modify Stage
I got a document on DS EE best practices, It says about modify stage
"3.3.7 Modify stage
After DataStage release 7.5.1, Transformer stage performs better than Modify stage even for simple null handling operations. Moreover Modify stage breaks the metadata link between the stages. So it is not recommended to use Modify stage in the jobs."
Is this true?
is it worse compared to a transformer or is this just another made up argument?
I am shocked
"3.3.7 Modify stage
After DataStage release 7.5.1, Transformer stage performs better than Modify stage even for simple null handling operations. Moreover Modify stage breaks the metadata link between the stages. So it is not recommended to use Modify stage in the jobs."
Is this true?
is it worse compared to a transformer or is this just another made up argument?
I am shocked
-
- Participant
- Posts: 222
- Joined: Tue Aug 30, 2005 2:07 am
- Location: pune
- Contact:
Hi Ajith,
I am not pretty sure about your satatement about modify stage. In which document you have read about this performance tips ? even I have some performance documents, they are saying that Modify stage is the more useful stage in the DS Parallel. Until, I haven't find any drgastic performance degrade with modify stage, while handling nulls.
I am not pretty sure about your satatement about modify stage. In which document you have read about this performance tips ? even I have some performance documents, they are saying that Modify stage is the more useful stage in the DS Parallel. Until, I haven't find any drgastic performance degrade with modify stage, while handling nulls.
NageshSunkoji
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
If you know anything SHARE it.............
If you Don't know anything LEARN it...............
-
- Participant
- Posts: 34
- Joined: Fri Sep 22, 2006 10:59 pm
- Location: India
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
There are far too many unsupported assertions in that document, which - if it is or is based upon the one of which I'm thinking - you should not have (it's IBM Internal Confidential, produced by the Center of Excellence for use by IBM consultants).
While it is true to claim that performance improvements have been made in the Transformer stage, it remains true that the very primitive Modify stage is very efficient precisely because it is primitive. Indeed, if you inspect the code generated when a Transformer stage is compiled, you are very likely to see modify operators used in that code!
Find the author, demand objective proof!
While it is true to claim that performance improvements have been made in the Transformer stage, it remains true that the very primitive Modify stage is very efficient precisely because it is primitive. Indeed, if you inspect the code generated when a Transformer stage is compiled, you are very likely to see modify operators used in that code!
Find the author, demand objective proof!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
I talked about this in my blog entry Is the DataStage parallel transformer evil? and my approach is to always go with the Transformer first since it is the easiest stage to use and the most user-friendly.
The Modify stage can be plain nasty. It's okay if you are just doing trimming but if you need to perform more than one function on a field forget about it, and if you haven't used it before and you need to do several types of functions you could spend hours getting the syntax right. The Transformer on the other hand helps you with the syntax with the right click menu and syntax checking.
I would only use the Modify stage if I needed to eke some extra performance out of a job, so I would add it after I had completed my job design and discovered in performance testing that it was too slow. Even then I wouldn't be surprised to get a 2% performance improvement.
The Modify stage can be plain nasty. It's okay if you are just doing trimming but if you need to perform more than one function on a field forget about it, and if you haven't used it before and you need to do several types of functions you could spend hours getting the syntax right. The Transformer on the other hand helps you with the syntax with the right click menu and syntax checking.
I would only use the Modify stage if I needed to eke some extra performance out of a job, so I would add it after I had completed my job design and discovered in performance testing that it was too slow. Even then I wouldn't be surprised to get a 2% performance improvement.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
What i encountered was that errors generated while using functions in modify stage were far more difficult to correct and took many iterations.I almost gave up on PX when confronted with modify stage . On the theoretical side , im however led to believe equivalent code/transformation done by modify stage will be faster than a PX Transformer stage. I admit that im still confused when confronted with something that could be done by either stage.
-
- Participant
- Posts: 30
- Joined: Mon May 29, 2006 11:19 pm
Re: Modify Stage
I hope most of the DS users have the same frustrating experience with modify stage
IBM should atleast come up with some documentation for the Modify stage
IBM should atleast come up with some documentation for the Modify stage
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I took some time to play with it, to learn its idiosyncracies. Its very value is in how primitive an operator it is. It IS worth learning for all those little things (null handling, column name change, data type change) that you often have to do to get downstream stages to work properly.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
There is documentation for the Modify Stage in the Parallel Job Developer's Guide for your version (assuming you are using at least v7.5.1 or above). In the 7.5.1 doc, it's in Chapter 28 and includes most if not all available functions and the proper syntax.
To "preserve" your metadata in the visual sense (in that it's displayed on your output column links), make use of table definitions and load them and/or manually add columns to the metadata grid. There is no mapping tab as in other stages.
Internally, the operator itself will generate the proper output metadata to be shared with the next operator downstream. You can see this by adding the $OSH_PRINT_SCHEMAS environment variable.
Regards,
To "preserve" your metadata in the visual sense (in that it's displayed on your output column links), make use of table definitions and load them and/or manually add columns to the metadata grid. There is no mapping tab as in other stages.
Internally, the operator itself will generate the proper output metadata to be shared with the next operator downstream. You can see this by adding the $OSH_PRINT_SCHEMAS environment variable.
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.