Sub Sequence will improve performance

Nageshsunkoji · Post by **Nageshsunkoji** » Wed Sep 20, 2006 7:51 am

Hi Folks,

My sequence is getting aborted some times and succesfully completed some times. I have nearly 30 Jobs in my sequence. In that 10 jobs have big derivations. Some times Derivation 1 failing and some times Derivation 1 will run succesfully, but Deriavtion 2 failing. Some times sequence completing without problems.

Its giving Abort message like node_node1: Player 12 terminated unexpectedly. for all my aborting jobs.

I am calling the 3 deriavtions once parallely. like that I have 3 Sequencers in side a Sequence.

I am trying to do the follwing implementations to solve this issue.

1) run the derivations one by one that means sequentially.
2) Create one more sub sequence in side the sequence for all derivation jobs.

which implementation is better to come out of the problem ?

Your inputs are more appreciated.

Thanks & Regards
Nagesh.

ArndW · Post by **ArndW** » Wed Sep 20, 2006 7:58 am

Nagesh,

you can't design any type of a workaround until you solve the cause of your jobs aborting. The error message doesn't mean much; it is like trying to diagnose the reason why an engine isn't running using the red "motor" light on the panel.

Have you tried any diagnosis on the issue yet?

Nageshsunkoji · Post by **Nageshsunkoji** » Wed Sep 20, 2006 8:13 am

Arnd,

Thanks for your reply.

I have tried to Identify the problem. IBut, the main thing is the same sequence is working finely in Dev. But, its failing in SYS and UAT. got the following abort message for all my aborting jobs.

node_node1: Player 12 terminated unexpectedly.

Fatal Error: waitForWriteSignal(): Premature EOF on node servername

buffer(10),1: Error in writeBlock - could not write 32

Fatal Error: APT_BufferOperator::writeAllData() write failed. This is probably due to a downstream operator failure.

Lookup1 : Fatal Error: Unable to allocate communication resources

Error in writeBlock - could not write 32

I got all above fatal errors. I thought that, the above fatal errors are due to running of my 3 derivation jobs parallely. that's Y I am planning to run sequentially.

your inputs are more valuable.

kumar_s · Post by **kumar_s** » Wed Sep 20, 2006 6:33 pm

Be ready to take up both the option. You need to find the maximum usage of CPU at each instance and schedule the jobs accordingly. You can split up the job, and run it paralley till you reach the CPU usage at prescribed limit.
Do think about the number of process running altogether in every node for the jobs that been called in that instance.
Try to speak with your Unix SA to increase Swap space and cache if required.

Nageshsunkoji · Post by **Nageshsunkoji** » Thu Sep 21, 2006 9:41 am

Hi Kumar,

Thanx for your response.

I have splitted the sequence in to two parts and I have kept some jobs into the One sequence and other jobs in to the other sequence. Now, I am waiting for result.

Regards
Nagesh

koolnitz · Post by **koolnitz** » Thu Sep 21, 2006 1:03 pm

Nagesh,

As mentioned by you, the same set of jobs are running fine in DEV env but failing in UAT. I suggest you try to figure out if there is any significant diff in the cofiguration (DS, OS and DB) betn the two environment. Monitor the resource usage while running your jobs. As mentioned by Kumar, check if the swap space is set to an inappropriate value. Best thing would be

'Catch your Sys Admin'

Nageshsunkoji · Post by **Nageshsunkoji** » Fri Sep 22, 2006 5:19 am

Thanks Nitin for your response.

Hi All,

After Split up of my big sequence in to two sequences, still I am facing the same problem. It's working in Dev and aborting in SYS and OAT for somany times with Unix SIG kill and running after some runs .Can I split those sequences again in to two more sequences ? the total 4 sequences in place of one. But, I am not sure whether it will solve my problem.

As DSXperts suggested to check the swap space and all other things. Can I ask u all what are all I have to check in different environments and How to comapre the both environments ? What are all paremeters are important in UNIX server ?

Your inputs are very valuable to me.

Regards
Nagesh.

pneumalin · Post by **pneumalin** » Fri Sep 22, 2006 12:20 pm

Nagesh,
Try to turn off the Job Monitor in SYS and UAT via
DS Administrator->Project->Properties->Enviroment->Look for APT_NO_JOB_MON in Reporting section, and change the value to "True" and save it by clicking OK, and then run your job.
Let me know if this fixes your problem.

Nageshsunkoji · Post by **Nageshsunkoji** » Tue Sep 26, 2006 9:31 am

pneumalin,

Thanks for your response.

I have tried by changing the default value of Environmental Variable APT_MONITOR_SIZE from blank to 100000. But, it's not completely solved my problem. Then I have set the environmental variable APT_NO_JOB_MON to true. Now, my sequence is succesfully completed without any aborts. I have ran one big sequence only. Still my job succesfully completed.

Kudos to pneumalin and others, who gideid me to come out of this problem.

But, still I have one concern If I set APT_MONITOR_SIZE with 100000 and APT_NO_JOB_MON to TRUE. It will affect any performance. Please, let me know your comments on the same.

Regards,
Nagesh.

DSXCHNAGE IS TOO POWERFUL AND AWESOME

pneumalin · Post by **pneumalin** » Tue Sep 26, 2006 10:06 am

You are welcome! It's always glad to hear someone gets out of their head-cratching problem!
Followings are some advises for you to consider:
1. The performance of sll the other jobs is not impacted by this action. Instead, it will be improved greatly.
2. We turned it off in our production environment on project level since we don't need it, and we believe we don't need to waste the resource in that environment. Job Monitor is a Java class talking to DS job in runtime to collect all the count information and then post it back to DS Engine.
3. It does have a patch to address this probblem in 7.5.1 if you so desired to fix it. Contact your IBM support or direclty upgrade DS Server to 7.5.2.

Cheers.