6 Transformers Or 60 Stage Variables

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ds_infy
Premium Member
Premium Member
Posts: 59
Joined: Tue Jun 09, 2009 4:17 am
Location: India

6 Transformers Or 60 Stage Variables

Post by ds_infy »

Hi,

I am at a cross road between choosing one of the 2 types of designs.

Problem :

From a database i will receive 6 different types of levels, Based on the level(eg: 010,050,030,060)
i need to perform

a. Split the String into multiple columns
b. Perform validation of each of these columns and Add up all the error messages as a single string at each level

I will be able to acheive the solution using the following designs:

Solution:
1. Use 60 variables to perform validation of data
or
2. Use 6 transformers to perform validation of each of these levels and create the ErrorMessage String

My question is which is a better approach:
1. Have 60 stage variables
2. Have 6 transformers

Please suggest
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I would go with the 6 transformers design unless you document what each and every stage variable is doing. It will be kinda hard to follow what the 60 variables are doing, even for you after a few months.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
daignault
Premium Member
Premium Member
Posts: 165
Joined: Tue Mar 30, 2004 2:44 pm
Contact:

Post by daignault »

Every time you use a transformer stage, the data is exported, processed by the C++ code in the transformer and re-imported with data validation into the Datastage run machine.

This is a very expensive operation to perform 6 times. Instead use a Switch stage to segment the data.

Ray D
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'd advocate 60 meaningfully-named stage variables, then monitor the stage to determine its resource consumption. If that's less than 100%, leave it alone. Otherwise break it into two and monitor again. Repeat until no process is demanding more than 100% of one CPU.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ds_infy
Premium Member
Premium Member
Posts: 59
Joined: Tue Jun 09, 2009 4:17 am
Location: India

Post by ds_infy »

Forgot to tell you that the version of DS is 7.5.3, if it helps
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

I would go so far as to say 60 well-named stage variables with well-thought-out and efficiently-written derivations. Sure wouldn't want to see anything like this:

if input_link.thedate = StringToDate('1299-01-01') then 'NULL' else if input_link.thedate >= StringToDate('2000-01-01') and input_link.thedate <= StringToDate('2010-12-31') then 'VALID' else 'INVALID'

On the other hand, this type of code does generate billable work for me :)
- james wiles


All generalizations are false, including this one - Mark Twain.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

daignault wrote:Every time you use a transformer stage, the data is exported, processed by the C++ code in the transformer and re-imported with data validation into the Datastage run machine.
Ray D
At one time maybe so, but not since at least 7.5 and maybe 7.0 IIRC. Transformer-generated code is framework-native and while it can't match hand-rubbed C++ custom operators, it can be pretty darn good when well written. Most performance issues I see with transformers are due to poor derivation logic and/or job design practices by developers.

Although, writing 6 separate transformers wouldn't be recommended even now. That's just extra stages for the data to be transported between and to pass through.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

jwiles, just 2 questions about your comment.

1) What do you mean by "framework-native"?

2) Are you saying that after DS 7.5, a job with a transformer is not interpreted into C++ under the hood?
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

abc123 wrote:jwiles, just 2 questions about your comment.

1) What do you mean by "framework-native"?

2) Are you saying that after DS 7.5, a job with a transformer is not interpreted into C++ under the hood?
1: Maybe not the best term, but: The executable created by the transformer compilation is not isolated from the Orchestrate framework by means of the export/import process as an external source would be, or a wrappered operator somewhat is. It can also be combined with other operators by Orchestrate at runtime.

2: No, I am not saying that. The compilation process converts the logic in transformer derivations to the Orchestrate transform language (a subset of C++), then plugs that into a C++ operator framework and compiles the resulting operator. You still are required to have a supported C++ compiler in order to use transformers in a parallel job.

And yes, you still see the transform operator named in your OSH. Many improvements have been made to the the operator and the generated code to eliminate their notorious slow performance in the earlier releases. A well written transformer can have performance that approaches that of some native stages. I just really hate to hear that old paradigm still taught as the absolute truth.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

jwiles wrote:I just really hate to hear that old paradigm still taught as the absolute truth.
I agree. Some people swear by it.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply