Difference between Transformer Stage and all other stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Difference between Transformer Stage and all other stage

Post by opdas »

Hi,
One interviewer asked me the difference between transformer Stage and the rest stages.
I couldnt answer this if anybody know about this.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

What a pity! You know, you might have aced that interview if only you were a subscriber to my blog! I seem to remember a post on that very topic.

I did a blog on How do I get started with big expensive tools? about how difficult it is to get DataStage employment when you don't have experience with the tool. You need to emphasise your skills with programming languages, databases and a willingness to learn. The building blocks that help an ETL programmer. Then you need a bit of luck. Hopefully they are impressed by your honesty that you don't know the tool that well but are highly motivated to learn it.
prabu
Participant
Posts: 146
Joined: Fri Oct 22, 2004 9:12 am

Re: Difference between Transformer Stage and all other stage

Post by prabu »

opdas wrote:Hi,
One interviewer asked me the difference between transformer Stage and the rest stages.
I couldnt answer this if anybody know about this.
rest stage is used to "relax" :D while transformer stage does all the hard work.

Jokes aside, Transformer stage slows down things and it is a residue of the server job. Use it as a last resort if no other stage is going to accomplish your needs. Meaning, transformer can accomplish all of the things any PX stage does. Converse may not be true .


I remember reading something like "transformer stage requires a do a cotext-switching " like SQL - PL/SQL context swithching in Oracle.


hope DS Gurus will further explain


regards,
Prabu
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Inclusion of a Transformer stage in a parallel job will require more time to compile, because source code has to be generated and compiled.
And there will be a small overhead in initial invocation of any libraries needed at run time. But this wont be the same for other stages such as Modify, Switch, Copy and other stage which may might replace transformer at some places. But is found that the job runs 25% faster when other replace for transformer (when possible).
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

A basic transformer in px, its like putting a corvette in a school zone. If you absolutely have to, only then use it. I think you are better off trying to go through vincent's blogs or doing a little bit more searching on this site.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
pneumalin
Premium Member
Premium Member
Posts: 125
Joined: Sat May 07, 2005 6:32 am

Post by pneumalin »

Hi Guys,
I don't agree with the statement of Bad Performance in Transformer Stage PX. I have posted similar statement to defense PX Transformer couple months ago, but probably no one paid too much attention to it at that time. Actually, Ascential Senior Engineer has confirmed with me that the statement regarding to avoid using PX Transformer in Advanced Guide was out-dated and only applied to 7.0. They promised to remove that statement in next document refresh.
From my experience, I couldn't tell there is any significant difference in performance between using PX Transformer and other stages. I am interested in knowing how Kumar comes up with the number 25% faster. I agree with Kumar that the compilation time might takes longer time since it needs to generate the source and pass it to compiler when using Transformer, but once it is compiled as a Shared Object, the SO shall be loaded in Runtime as requested by DS Engine, so is the rest of SOs who contains the objects of other stages such as Modify, Copy, etc.. Therefore, by theory I don't it make such a big difference to use PX Transformer. Please comment on it if you have other views on this.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

As I said in Is the DataStage parallel transformer evil? the benefits of the transformer stage far outweigh any time overhead. I would always design with transformers and switch them later if extra performance is required. There was some bad press about transformers a couple ascentialworlds ago and they didn't do a very good job of correcting that information.

One big improvement I've noted is a transformer with constraints that is replaced by a filter stage followed by a transformer. The transformer does a transform followed by a filter. Therefore you are transforming a number of rows that are getting filtered out. The filter stage reverses the order and gives you a filter followed by a transform. This is good when you are removing a large number of rows such as a range based lookup.
opdas
Participant
Posts: 115
Joined: Wed Feb 01, 2006 7:25 am

Post by opdas »

I posted a topic few days back regarding the performance between filter stage and using constraint in PX transformer stage and have noted that PX transformer was many times fast in filtering and the filter stage performance has dwindled as the time progresses.
The filter was based on a string string match.

This was measured through rows/sec .
Post Reply