Basic Transformer in Parallel job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Suman
Participant
Posts: 53
Joined: Thu Oct 07, 2004 8:28 am

Basic Transformer in Parallel job

Post by Suman »

I have one server routine to be called from transformer.As in parallel job normal transformer cannot call server routine I am thinking to use basic transformer instead.But as it reduces the performance a lot I want to know if any statistics is present where I can find how much the perfomance will hamper because of using this basic transformer in parallel job. Is it better to use a server job than using basic transformer in parallel job.
johnthomas
Participant
Posts: 56
Joined: Mon Oct 16, 2006 7:32 am

Post by johnthomas »

try using shared container instead for calling the server routine from a transformer
JT
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

If you could re-do the Basic Routine in C that would be great. This way you can call that routine in a 'normal' transformer. This is offcourse if the data volume is expected to be large.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you don't have statistics how can you assert that "it reduces performance a lot"?!

Depending on the data volumes it may be better (easier) to use a server job.

Are you sure that the server routine's functionality could not be migrated to a parallel routine or BuildOp stage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Suman
Participant
Posts: 53
Joined: Thu Oct 07, 2004 8:28 am

Post by Suman »

johnthomas wrote:try using shared container instead for calling the server routine from a transformer
Shared Container option is fine when server routine is used for each of the record coming from input.But server routine is used only if there is no value found for a particular field inside the transformer.It is like
If IsNull(A) then Serverroutine Value else value of A where A is the field value.
So shared container value should be taken only if field A is null.
Suman
Participant
Posts: 53
Joined: Thu Oct 07, 2004 8:28 am

Post by Suman »

ray.wurlod wrote:If you don't have statistics how can you assert that "it reduces performance a lot"?!

Depending on the data volumes it may be better (easier) to use a server job.

Are you sure that the server ro ...
Server job is already existing and taking around 17-20 secs. There is no parallel job. Server job now has to be converted to parallel job and the options I am thinking of now is using Basic Transformer or a parallel routine using C. Basic transformer in Parallel job reduces performance this is recommended by Ascential and I came to know about it from my collegues.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Suman wrote:Server job now has to be converted to parallel job
Why?

If it isn't broken, don't fix it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

It doesnt "have to have to "be changed. Argue your way accross. If no avail, then atleast post the logic, maybe we can help you build a c routine.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
johnthomas
Participant
Posts: 56
Joined: Mon Oct 16, 2006 7:32 am

Post by johnthomas »

suman ,

As per your reply "server routine is used only if there is no value found for a particular field " .This could be achieved using switch and merge stage . Also since you include a server job in a sequencer job along with parallel job , i would go with like ray has commented "dont fix unless its broke"
JT
Suman
Participant
Posts: 53
Joined: Thu Oct 07, 2004 8:28 am

Post by Suman »

ray.wurlod wrote:
Suman wrote:Server job now has to be converted to parallel job
Why?

If it isn't broken, don't fix it. ...
Parallel job is required as I am converting all server jobs into parallel jobs to improve the performance of the whole process. The output of the existing server job is a hash file which is used for lookup in the next few jobs which will be parallel jobs. As there is no hash file in parallel job dataset is required for lookup instead of hash file and that is the reason for a parallel job instead of a server job.

Suman
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Suman wrote:Parallel job is required as I am converting all server jobs into parallel jobs to improve the performance of the whole process.
It won't.

For small tasks server jobs, at versions earlier than 8.0, will always be more efficient (finish faster) than the equivalent parallel job. This is mainly because of the much greater startup costs of parallel jobs.

There is no reason not to use both server jobs and parallel jobs. They can be started from the same job sequence.

If you want to get rid of the hashed file, write a parallel job that loads a Lookup File Set and use that instead.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Build hashed file and dump its contents to a sequential file. Then use that sequential file to build your lookup set or dataset.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Suman wrote:Parallel job is required as I am converting all server jobs into parallel jobs to improve the performance of the whole process.
You'll actually decrease 'the performance of the whole process' by doing that. As noted, smaller tasks will be more efficient and take less time as Server jobs. And someone sold you a bill of goods if they told you that you 'needed' to do this just because you upgraded to EE. Keep a mixture of the two job types. Convert only what would benefit from the conversion.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Suman
Participant
Posts: 53
Joined: Thu Oct 07, 2004 8:28 am

Post by Suman »

I have written the C program and created .o file . But during compilation I am getting the following errors in transformer:

##E TBLD 000000 02:19:10(000) <main_program> Error when checking composite operator: Subprocess command failed with exit status 256.
##E TFSR 000019 02:19:10(001) <main_program> Could not check all operators because of previous error(s)
##W TFCP 000000 02:19:10(002) <transform> Error when checking composite operator: The number of reject datasets "0" is less than the number of input datasets "1".
##W TBLD 000000 02:19:10(003) <main_program> Error when checking composite operator: Output from subprocess: "/opt/ds/app/ETLDev/RT_BP633.O/V0S3_SampleRoutinetest_Transformer_3.C", line 523: error #2390: function "main" may not be called or have its address taken
output0Int32B[0]=main();
^

##W TBLD 000000 02:19:10(004) <main_program> Error when checking composite operator: Output from subprocess:

##I TFCP 000000 02:19:10(005) <transform> Error when checking composite operator: /opt/aCC/bin/aCC -L/opt/ds/app/ETLDev/RT_BP633.O/ -L/home/dsadm/Ascential/DataStage/PXEngine/lib -L/home/dsadm/Ascential/DataStage/PXEngine/user_lib +DD64 -b -Wl,+s -Wl,+vnocompatwarnings -lorchhpia64 -lorchcorehpia64 -lorchbuildophpia64 /home/skundu/Test/Read1.o /opt/ds/app/ETLDev/RT_BP633.O/V0S3_SampleRoutinetest_Transformer_3.tmp.o -o /opt/ds/app/ETLDev/RT_BP633.O/V0S3_SampleRoutinetest_Transformer_3.so.
##W TBLD 000000 02:19:10(006) <main_program> Error when checking composite operator: Output from subprocess: 1 error detected in the compilation of "/opt/ds/app/ETLDev/RT_BP633.O/V0S3_SampleRoutinetest_Transformer_3.C".

##W TBLD 000000 02:19:10(007) <main_program> Error when checking composite operator: Output from subprocess: aCC: warning 1913: `/opt/ds/app/ETLDev/RT_BP633.O/V0S3_SampleRoutinetest_Transformer_3.tmp.o' does not exist or cannot be read

##W TBLD 000000 02:19:10(008) <main_program> Error when checking composite operator: Output from subprocess: ld: Can't find library or mismatched ABI for -lorchhpia64
Fatal error.

Do I need to change the library path in env variable ld_library_path or any other setting because one error is it is not getting the library.

Any idea about these errors will be helpful.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

When you test your routine from command line, does it work, how are you compiling the source code, make sure you compile it with the +Z option.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply