Page 1 of 1

Job processing speed Vs available nodes

Posted: Wed Feb 08, 2017 6:23 am
by zulfi123786
Hi,

There seems to be a situation which I cant fully comprehend.

A job is reading data from sequential file feeding data to Basic Transformer and writing to sequential file.

The basis transformer calls server routine which in turn calls Change() over 200 times, so is CPU intensive.

The job when run on 2 nodes has taken 2 hours to complete and when run on 4 nodes has also taken 2 hours to complete.

I have used HP Performance manager to analyse system load and it turns out that CPU consumption was at 20% (total 24 processors on AIX) and this job added 20% more on 2 nodes and 30% more on 4 nodes so there was still lot of CPU room left. Memory utilization was in both cases 85% while the job was running. Peak disk utilization was also very low, so was CPU run queue.

When there was more CPU available, why did the jobs processing not improve with 2 additional nodes when run on 4 node config file.

Thanks in advance

Posted: Wed Feb 08, 2017 7:20 am
by JRodriguez
Hi Zulfi,
Last time I used a Basic transformer it was not capable to execute in parallel as the rest of the stages. That's still might be the case... However nothing prevent you to split the data flow upstream base on a field value and send it to multiple Basic Transformer... In other words, implement your own paralellism and gets the improvement that you are looking for

Regards

Posted: Wed Feb 08, 2017 11:52 pm
by ray.wurlod
In version 11, Change() function is available in the parallel Transformer stage. So is Ereplace() function.

So you may be able to replace your BASIC Transformer stage with a parallel Transformer stage. I guess it depends on the complexity of the server routine; but I would further guess that you can probably use stage variables and looping to achieve most things.

Posted: Thu Feb 09, 2017 5:07 am
by zulfi123786
Hi Ray,

These jobs were born in the earlier versions so the Server Routine. The routine has nothing but cascaded calls to Change() to replace a list of 200 substrings. We can break it down into stage variables in parallel transformer but this needs to replicated in all places the server routine is called.

I would have had it in Parallel routine but these have a memory leak issue where memory of the pointer that is being returned by the function is not being released post the row is processed and the job aborts when data has millions of records.

Would appreciate if you have anything relevant to share from your vast experience as to why the job would not use available CPU to improve performance when run on more nodes. Would you think this is restricted at any level ?

There are no limits on userid though.

Thanks

Posted: Thu Feb 09, 2017 8:10 am
by chulett
As noted I would assume it's simply the fact that you have a BASIC Transformer in the job and that it has no parallel capability, thus creating a choke point. From what little I recall, it is restricted to running on one specific node - head? conductor? - don't recall which but I'm sure someone knows for certain.

I imagine the score if you dumped it would confirm this.

Posted: Thu Feb 09, 2017 10:57 am
by UCDI
you can make routines in other languages that can use multi-threading if you really need the performance, or that can be called from a parallel transformer, if splitting multiple copies of the basic transformer out to manually make it parallel is not sufficient.

Comes down to "what do you need done" and "how fast do you need it, really".

If you can't do what you need to do in datastage built in tools, and basic isnt fast enough... this should be a pretty rare issue, but when it hits, datastage gave hooks to solve the problem.

Posted: Sat Feb 11, 2017 8:16 am
by zulfi123786
chulett wrote:As noted I would assume it's simply the fact that you have a BASIC Transformer in the job and that it has no parallel capability, thus creating a choke point. From what little I recall, it is restricted to running on one specific node - head? conductor?
Its an SMP server with just one physical node so all logical nodes are running on same box. The basic transformer is running in parallel as the job monitor shows multiple instances for this stage.

Posted: Sat Feb 11, 2017 8:19 am
by zulfi123786
UCDI wrote:you can make routines in other languages that can use multi-threading if you really need the performance
The question here is not about improving performance, instead its on why the jobs is not utilizing additional free resources when more nodes are made available to it .

Posted: Sat Feb 11, 2017 8:28 am
by chulett
Ah... the SMP notes would have been good to know up top. As to your last question, isn't that all up to the operating system, not DataStage? And aren't you worrying about the "additional free resources" so that performance improves? :wink: