ETL errors while running batch schedule

lakshya · Post by **lakshya** » Tue Feb 28, 2006 11:48 am

Hi-

We are getting the following errors while running our batch schedule.Our batch runs group wise based on the dependencies where we have a bunch of jobs that kick off at the same time.The jobs run fine when they are run individually,but as a group they start throwing all different errors mentioned below. Is there any limit for the number of jobs we can initiate at the same time ? Or is there some other issue with the jobs ?

1 : main_program: Fatal Error: Service table transmission failed for node1

2 : (ps:Broken pipe. This may indicate a network APT_PM_CONDUCTOR_TIMEOUT to a
larger value (when unset, it defaults to 60) may alleviate this problem

3 : Wd. (fatal error from ): Error executing phantom command =>
DSD.OshMonitor record has been created in the '&PH&' file.
Unable to create PHANTOM process.

4 : node1: Fatal Error: Unable to start ORCHESTRATE process on node
node1 (ps): APT_PMPlayer::APT_PMPlayer: fork() failed, Resource
temporarily unavailable

5 : main_program: **** Parallel startup failed ****
This is usually due to a configuration error, such as
not having the Orchestrate install directory properly
mounted on all nodes, rsh permissions not correctly
set (via /etc/hosts.equiv or .rhosts), or running from
a directory that is not mounted on all nodes. Look for
error messages in the preceding output.

6 : main_program: Unable to contact one or more Section Leaders.
Probable configuration problem; contact Orchestrate system administrator.

7. : node1: Fatal Error: Unable to start ORCHESTRATE process on node
node1 (ps): APT_PMPlayer::APT_PMPlayer: fork() failed, Resource
temporarily unavailable

Please let us know if we need to modify something from our side to help reslove this issue.

Thanks

ray.wurlod · Post by **ray.wurlod** » Tue Feb 28, 2006 12:10 pm

Looks like your system can't handle the total load, as indicated by the "unable to start PHANTOM process" message. "PHANTOM" is just DataStage terminology for a background process. Involve your UNIX administrator to check the size of the process table and the size of the per-process number of background processes.

DSguru2B · Post by **DSguru2B** » Tue Feb 28, 2006 6:35 pm

I have gone through this problem alot. This happens when the box configuration is not modified. Meaning. the Cache space for OSH should be increased from a default of probably 256mb to 1G which would be helpful enough. This shall be done by the unix admin.
Ray.. please comment or correct me.
Thanks

ray.wurlod · Post by **ray.wurlod** » Tue Feb 28, 2006 10:04 pm

I don't believe that the cache space for OSH being insufficient would lead to the "broken pipe" error thar was reported. This looks much more like a timeout possibly caused by too many processes on the machine (and therefore too long a wait to start another process).

kumar_s · Post by **kumar_s** » Wed Mar 01, 2006 6:30 am

It seems you have already altered the APT_PM_CONDUCTOR_TIMEOUT in administrator to avoid time outs.
Still you get error mention in point 4.
It also seems conductor process cannot reach section leader process on each processing node. Its Clearly ment your server is overloaded.
Try to avoid calling the number of job in parallel in Job sequence.

lakshya · Post by **lakshya** » Fri Mar 03, 2006 8:18 am

Hi All-

Thanks for your responses on the topic.The issue is resolved now.

We have increased the number of processes allowed per user on the unix box from the existing 500 and increased it to a higher limit which was sufficient to handle all the processes kicked off by the ETL's.

The batch finished successfully after the fix

Thanks

kumar_s · Post by **kumar_s** » Sat Mar 04, 2006 4:33 am

lakshya wrote:Hi All-

Thanks for your responses on the topic.The issue is resolved now.

We have increased the number of processes allowed per user on the unix box from the existing 500 and increased it to a higher limit which was sufficient to handle all the processes kicked off by the ETL's.

The batch finished successfully after the fix

Thanks

Hi,
May i know the command used to find the maximum process allowed per user and the command to increase it.

ray.wurlod · Post by **ray.wurlod** » Sat Mar 04, 2006 7:48 am

It's usually a UNIX kernel parameter named something like NPROC. But the name varies on different UNIXes.

chulett · Post by **chulett** » Sat Mar 04, 2006 8:00 am

In other words, have a chat with an SA.

ray.wurlod · Post by **ray.wurlod** » Sat Mar 04, 2006 9:06 pm

The original error also mentioned that a file had been created in the &PH& directory in your project (on the server). Is there any useful diagnostic information in that file?

DSguru2B · Post by **DSguru2B** » Mon Mar 06, 2006 8:31 am

One small tip to avoid these issues, is PLEASE gracefully logoff from DS or any DB services.
The processes are left hanging under each user hence increasing the load of processes. :D

sunayan_pal · Post by **sunayan_pal** » Tue Apr 08, 2008 4:29 am

i guess it is purely the resource problem, im my case 100% of the jobs run successfully in a re-run.
but what get write in &PH&. please suggest

Nagaraj · Post by **Nagaraj** » Wed Aug 27, 2008 8:36 am

Want to get rid of PHP files in &PH& directory.

This is what is shown in ph files after each job run:-

[User@hostname &PH&]$ more DSD.RUN_37693_14850_558136
DataStage Job 337 Phantom 7978
The variable "APT_PERFORMANCE_DATA" is not in the environment.
DataStage Phantom Finished.
[User@hostname &PH&]$

This "APT_PERFORMANCE_DATA variable is there.

ray.wurlod · Post by **ray.wurlod** » Wed Aug 27, 2008 3:22 pm

This question is not related to the subject of this thread. Please begin a new thread.