Column Generator Vs Transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Nageshsunkoji
Participant
Posts: 222
Joined: Tue Aug 30, 2005 2:07 am
Location: pune
Contact:

Column Generator Vs Transformer

Post by Nageshsunkoji »

Hi Folks,

I have a requirement to create a new column called Empno and populate the Empno with parameter EMPNO_INT_PARM.
Now, I have two options to create the Empno
1) Use the Transformer to create the column and populate with vale
2) Use Column generator stage and create new column and generate with parameter value.
I am using the transformer in my job only to create this column and i am not performing any functions in my transformer.

Please suggest me which one is the better option to create new column ( I am using Datastage EE 7.5v )

Thanks and Regards
Nagesh.
NageshSunkoji

If you know anything SHARE it.............
If you Don't know anything LEARN it...............
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In theory at least the Column Generator stage should be more efficient than a Transformer stage, (a) because it does so much less work and (b) because it doesn't need to be compiled into a separate C++ function.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
manishsk
Participant
Posts: 13
Joined: Mon Mar 14, 2005 9:37 pm

Re: Column Generator Vs Transformer

Post by manishsk »

I believe Transformaer will be a better option, as far as perfromance is concerned and simplicity of the code is concerned.

As in transformer its very simple to just add one column.
Also from maintenance perspective if you need to add some processing for said column, it will be better to use transformer.

And for column genrator I believe you need to use schema file. I have not used this stage much but I think transformer will be a better option.

All> please put your thoughts. As this will also help us to get into the column gen stage and its uses.

Thanks & Regards,
Manish
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Manish,

you have made an assertion that the transformer is better; but then you are stating that other people should test this for you.

The column generator was created for a purpose and should be used. Ray has already stated the major difference in that a transformer stage needs to be code-generated, compiled and linked in to a job - which doesn't need to be done for a column generator. I don't know what a "schema file" is; you can use a schema within PX for column definitions, but you may also modify them in most places.

From a maintenance perspective - a transform can be used for many things, so the only way I know what a given object does is by it's name. A column generator stage is pretty explicit and I know what is probably going on "inside". Generating columns in a transform in PX jobs is, in my opinion, a bad maintenance option.

If you want, you can write a job to compare the performance of both and report back - but please don't ask others to do that work for you.
manishsk
Participant
Posts: 13
Joined: Mon Mar 14, 2005 9:37 pm

Post by manishsk »

Arnd,

Sry, but may be you got it wrong. I was not insisting others to do the work. Just asked thoughts on the same, as I have also not used the stage extensively so was doubtful on that.

I was not aware about ray's note as I think ray put his note at the same time I put. :wink:

Apologies if I did something wrong.

Now from ray's comment yes column genrator stage will be effective.
Still I felt from the simplicity of the code and less development work transformer stage will be good.

Thanks & Regards,
Manish
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I just wrote a test job to compare the relative speeds of generating a single VarChar(32) column and filling it with the constant value of "Test" between using a Transform and a Column Generator. After several runs of 5 minutes the average speeds worked out to

Transform Stage- 515,463 rows per second.
Column Generator - 555,565 rows per second.

The column generator is 7.2% faster on this machine.
Nageshsunkoji
Participant
Posts: 222
Joined: Tue Aug 30, 2005 2:07 am
Location: pune
Contact:

Post by Nageshsunkoji »

Hi Arnd & ray,

Thanks a ton for your inputs.

I dont have huge data to test, that is the reason i posted the query here.

Thanks for your statistics.

Thanks and Regards,
Nagesh.
NageshSunkoji

If you know anything SHARE it.............
If you Don't know anything LEARN it...............
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Nageshsunkoji wrote:Hi Arnd & ray,

Thanks a ton for your inputs.

I dont have huge data to test, that is the reason i posted the query here.

Thanks for your statistics.

Thanks and Regards,
Nagesh.
Row generator could also be used to produce huge clean test data.

-Kumar
Post Reply