6 Transformers Or 60 Stage Variables

ds_infy · Post by **ds_infy** » Mon Jan 24, 2011 2:34 pm

Hi,

I am at a cross road between choosing one of the 2 types of designs.

Problem :

From a database i will receive 6 different types of levels, Based on the level(eg: 010,050,030,060)
i need to perform

a. Split the String into multiple columns
b. Perform validation of each of these columns and Add up all the error messages as a single string at each level

I will be able to acheive the solution using the following designs:

Solution:
1. Use 60 variables to perform validation of data
or
2. Use 6 transformers to perform validation of each of these levels and create the ErrorMessage String

My question is which is a better approach:
1. Have 60 stage variables
2. Have 6 transformers

Please suggest

DSguru2B · Post by **DSguru2B** » Mon Jan 24, 2011 2:41 pm

I would go with the 6 transformers design unless you document what each and every stage variable is doing. It will be kinda hard to follow what the 60 variables are doing, even for you after a few months.

Post by **daignault** » Mon Jan 24, 2011 3:01 pm

Every time you use a transformer stage, the data is exported, processed by the C++ code in the transformer and re-imported with data validation into the Datastage run machine.

This is a very expensive operation to perform 6 times. Instead use a Switch stage to segment the data.

Ray D

ray.wurlod · Post by **ray.wurlod** » Mon Jan 24, 2011 4:03 pm

I'd advocate 60 meaningfully-named stage variables, then monitor the stage to determine its resource consumption. If that's less than 100%, leave it alone. Otherwise break it into two and monitor again. Repeat until no process is demanding more than 100% of one CPU.

ds_infy · Post by **ds_infy** » Mon Jan 24, 2011 5:07 pm

Forgot to tell you that the version of DS is 7.5.3, if it helps

jwiles · Post by **jwiles** » Mon Jan 24, 2011 6:56 pm

I would go so far as to say 60 well-named stage variables with well-thought-out and efficiently-written derivations. Sure wouldn't want to see anything like this:

if input_link.thedate = StringToDate('1299-01-01') then 'NULL' else if input_link.thedate >= StringToDate('2000-01-01') and input_link.thedate <= StringToDate('2010-12-31') then 'VALID' else 'INVALID'

On the other hand, this type of code does generate billable work for me

jwiles · Post by **jwiles** » Mon Jan 24, 2011 7:08 pm

daignault wrote:Every time you use a transformer stage, the data is exported, processed by the C++ code in the transformer and re-imported with data validation into the Datastage run machine.
Ray D

At one time maybe so, but not since at least 7.5 and maybe 7.0 IIRC. Transformer-generated code is framework-native and while it can't match hand-rubbed C++ custom operators, it can be pretty darn good when well written. Most performance issues I see with transformers are due to poor derivation logic and/or job design practices by developers.

Although, writing 6 separate transformers wouldn't be recommended even now. That's just extra stages for the data to be transported between and to pass through.

Regards,

abc123 · Post by **abc123** » Thu Jan 27, 2011 5:33 pm

jwiles, just 2 questions about your comment.

1) What do you mean by "framework-native"?

2) Are you saying that after DS 7.5, a job with a transformer is not interpreted into C++ under the hood?

jwiles · Post by **jwiles** » Thu Jan 27, 2011 6:30 pm

abc123 wrote:jwiles, just 2 questions about your comment.

1) What do you mean by "framework-native"?

2) Are you saying that after DS 7.5, a job with a transformer is not interpreted into C++ under the hood?

1: Maybe not the best term, but: The executable created by the transformer compilation is not isolated from the Orchestrate framework by means of the export/import process as an external source would be, or a wrappered operator somewhat is. It can also be combined with other operators by Orchestrate at runtime.

2: No, I am not saying that. The compilation process converts the logic in transformer derivations to the Orchestrate transform language (a subset of C++), then plugs that into a C++ operator framework and compiles the resulting operator. You still are required to have a supported C++ compiler in order to use transformers in a parallel job.

And yes, you still see the transform operator named in your OSH. Many improvements have been made to the the operator and the generated code to eliminate their notorious slow performance in the earlier releases. A well written transformer can have performance that approaches that of some native stages. I just really hate to hear that old paradigm still taught as the absolute truth.

Regards,

DSguru2B · Post by **DSguru2B** » Fri Jan 28, 2011 8:01 am

jwiles wrote:I just really hate to hear that old paradigm still taught as the absolute truth.

I agree. Some people swear by it.