Jobs running very longtime

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
amarnadh0203
Participant
Posts: 10
Joined: Mon Jul 28, 2008 7:37 am

Jobs running very longtime

Post by amarnadh0203 »

Hi All,

We are moving from 7.1R1 server environment to 7.5.2 parallel linux environment.

In first phase we have migrated all our server jobs from 7.1R1 to 7.5.2 and did minimal changes to the jobs ex: environment variables, paths etc.

As part of test we are running the jobs the jobs keeps running more than 24hours where as the same job runs on 7.1R1 in a minute or two.

when i have seen the processes on the linux server "Phantom DSD.RUN" keeps-on running.

all that the jobs does is pull the data from SQL table and load into a sequential file. The job has several individual processes in single server job.


Any help will be useful ... thanks in advance.

Regards,
Amar.... :)
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There are several things I would do. Ask your DBA to see if sessions are open to SQL Server, and check to see if the job is still running as a UNIX process. Are any of these processes using CPU or IO? Use "truss" or your OS's equivalent to see which system calls are being used. That's for starters, there are more options available, but the ones listed above will almost always narrow down the cause.
amarnadh0203
Participant
Posts: 10
Joined: Mon Jul 28, 2008 7:37 am

Post by amarnadh0203 »

i have seen the processes in the SQL server in "AWATING COMMAND" status for a long time ... i am also seeing processes in Linux server. Job remains in the Running status too.


n431c7 20823 26610 0 11:36 ? 00:00:00 phantom DSD.RUN P444_04_RI_Control_Stat 0/0/1/0/0
n431c7 20832 20823 0 11:36 ? 00:00:00 phantom DSD.StageRun P444_04_RI_Control_Stat. P444_04_RI_Control_Stat.Transformer_3 1 0/0/1
n431c7 20836 20823 0 11:36 ? 00:00:00 phantom DSD.StageRun P444_04_RI_Control_Stat. P444_04_RI_Control_Stat.Transformer_7 1 0/0/1
n431c7 20837 20823 0 11:36 ? 00:00:00 phantom DSD.StageRun P444_04_RI_Control_Stat. P444_04_RI_Control_Stat.Transformer_8 1 0/0/1
n431c7 20838 20823 0 11:36 ? 00:00:00 phantom DSD.StageRun P444_04_RI_Control_Stat. P444_04_RI_Control_Stat.Transfor
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

What does "truss -p 20832" show?
amarnadh0203
Participant
Posts: 10
Joined: Mon Jul 28, 2008 7:37 am

Post by amarnadh0203 »

"truss -p 20832" it days no such command.

Please provide any alternative.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I went to Google and entered "LINUX equivalent of truss" and found out that you need to enter the "strace" or "ltrace" command.
amarnadh0203
Participant
Posts: 10
Joined: Mon Jul 28, 2008 7:37 am

Post by amarnadh0203 »

i got the following listed:

n431c7@tedshdu2 /home/n431c7 $ strace -p 18240
Process 18240 attached - interrupt to quit
[ Process PID=18240 runs in 32 bit mode. ]
futex(0x3dc820, FUTEX_WAIT, 2, NULL
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

are the other processes also waiting on the same or similar locks?
amarnadh0203
Participant
Posts: 10
Joined: Mon Jul 28, 2008 7:37 am

Post by amarnadh0203 »

Yes, the other processes are also giving the same details as of first one.

Could you please explain me in details...?
satish_valavala
Participant
Posts: 123
Joined: Wed May 18, 2005 7:41 am
Location: USA

Post by satish_valavala »

Which user compiled the jobs? and which user is running jobs? See if you have any user group (unix) level permission issues.
Regards
VS
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are these server jobs (as marked) or parallel jobs (as posted)?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I haven't worked with FUTEX locks, but it seems to be the LINUX equivalent of MUTEX calls. So all the separate server job processes seem to be waiting on a signal that they are probably never going to get. Do you have any "defunct" or "zombie" processes visible?
Post Reply