Job is being strucked

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Sravani
Participant
Posts: 23
Joined: Thu Jun 15, 2006 3:56 am
Location: Hyderabad

Job is being strucked

Post by Sravani »

Hi Gurus,
When I am running a job from the Unix prompt, it is being initiated. Then OSH Script is intiated. From there it is not moving further. It is being strucked.
How can we track the status and why it is being strucked like that?

Thanks.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Identify the process ids and then use the "truss -p {pid}" command to see what these processes might be doing.
Does this mean if you start the job from the director it does not get "stuck"? What does the job do?
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

ArndW wrote:Identify the process ids and then use the "truss -p {pid}" command to see what these processes might be doing.
Does this mean if you start the job from the director it does not get "stuck"? What does the job do?

Hi Arnd,

sorry for posting my query in this thread. But i think its same.

I am getting the same problem i am not able to undestand what is going on inside.

While analyzing truss output i can see the sequence is running fine.

The is also running but i am unable to anlayze it completely.

output of truss -p for sequence

Code: Select all

[iehibu12] /u01/iisGIDev truss -p 1482774
_nsleep(0x00000000, 0x00000000) (sleeping...)
_nsleep(0x00000000, 0x00000000)                 = 0
sigprocmask(0, 0x00000000, 0x302DBE34)          = 0
klseek(39, 0, 6144, 0x00000000)                 = 0
kread(39, "\0\018 $\0\0\0 $\0\0\b C".., 2048)   = 2048
klseek(39, 0, 83968, 0x00000000)                = 0
kread(39, "\001 H T\0\018 T\0\0\b03".., 2048)   = 2048
klseek(39, 0, 110592, 0x00000000)               = 0
kread(39, "\001B0 L\001 H L\0\0\b03".., 2048)   = 2048
klseek(39, 0, 114688, 0x00000000)               = 0
kread(39, "\001C0 L\001B0 L\0\0\f03".., 2048)   = 2048
klseek(39, 0, 131072, 0x00000000)               = 0
kread(39, "\002\0 @\001C0 @\0\0\f03".., 2048)   = 2048
klseek(39, 0, 145408, 0x00000000)               = 0
kread(39, "\002 8 @\002\0 @\0\0\f03".., 2048)   = 2048
klseek(39, 0, 151552, 0x00000000)               = 0
kread(39, "\002 P H\002 8 H\0\0\f03".., 2048)   = 2048
klseek(39, 0, 178176, 0x00000000)               = 0
kread(39, "\002B8 H\002 P H\0\0\f03".., 2048)   = 2048
klseek(39, 0, 192512, 0x00000000)               = 0
kread(39, "\002F0 P\002B8 P\0\0\f03".., 2048)   = 2048
klseek(39, 0, 272384, 0x00000000)               = 0
kread(39, "\004 ( H\002F0 H\0\0\f03".., 2048)   = 2048
then i tried to run truss -p on job's process
here the truss was unable to control that process

so i tried

Code: Select all

nice -5 truss -p 
the result is as follows

Code: Select all

nice -5 truss -p 1523956
kread(0, 0x00000000, 0)                         Err#82 ERESTART
    Received signal #14, SIGALRM [caught]
sigprocmask(2, 0x300BF840, 0x00000000)          = 0
incinterval(0, 0x2FF1A428, 0x2FF1A438)          = 0
sigprocmask(0, 0x00000000, 0x2FF1A420)          = 0
sigprocmask(2, 0xF0464790, 0x2FF1A380)          = 0
_sigaction(14, 0x2FF1A440, 0x2FF1A430)          = 0
thread_setmymask_fast(0x00000000, 0x00000000, 0x00000000, 0x103EE005, 0x00000000, 0x00000164, 0x00000000, 0x00000000) = 0x00000000
appgettimer(9, 0x2FF1A500)                      = 0
sigprocmask(0, 0x00000000, 0x3020B004)          = 0
klseek(42, 0, 2048, 0x00000000)                 = 0
kread(42, "\0\0\t10\0\00110\0\0\f03".., 2048)   = 2048
sigprocmask(2, 0x3020B004, 0x00000000)          = 0
sigprocmask(0, 0x00000000, 0x2FF1A420)          = 0
sigprocmask(2, 0xF0464790, 0x2FF1A380)          = 0
_sigaction(14, 0x2FF1A440, 0x2FF1A430)          = 0
thread_setmymask_fast(0x00000000, 0x00000000, 0x00000000, 0x103EE005, 0x00000000, 0x000000C1, 0x00000000, 0x00000000) = 0x00000000
incinterval(0, 0x2FF1A428, 0x2FF1A438)          = 0
sigprocmask(0, 0x00000000, 0x300BF840)          = 0
kread(46, " # # I   I I S - D S E E".., 4096) (sleeping...)
kread(46, " # # I   I I S - D S E E".., 4096)   Err#82 ERESTART
    Received signal #14, SIGALRM [caught]
sigprocmask(2, 0x300BF840, 0x00000000)          = 0
incinterval(0, 0x2FF1A428, 0x2FF1A438)          = 0
sigprocmask(0, 0x00000000, 0x2FF1A420)          = 0
sigprocmask(2, 0xF0464790, 0x2FF1A380)          = 0
_sigaction(14, 0x2FF1A440, 0x2FF1A430)          = 0
thread_setmymask_fast(0x00000000, 0x00000000, 0x00000000, 0x103EE005, 0x00000000, 0x000000C6, 0x00000000, 0x00000000) = 0x00000000
appgettimer(9, 0x2FF1A500)                      = 0
sigprocmask(0, 0x00000000, 0x30273E64)          = 0
klseek(42, 0, 2048, 0x00000000)                 = 0
kread(42, "\0\0\t10\0\00110\0\0\f03".., 2048)   = 2048
sigprocmask(2, 0x30273E64, 0x00000000)          = 0
sigprocmask(0, 0x00000000, 0x2FF1A420)          = 0
sigprocmask(2, 0xF0464790, 0x2FF1A380)          = 0
_sigaction(14, 0x2FF1A440, 0x2FF1A430)          = 0
thread_setmymask_fast(0x00000000, 0x00000000, 0x00000000, 0x103EE005, 0x00000000, 0x000000C0, 0x00000000, 0x00000000) = 0x00000000
incinterval(0, 0x2FF1A428, 0x2FF1A438)          = 0
sigprocmask(0, 0x00000000, 0x300BF840)          = 0
kread(46, " # # I   I I S - D S E E".., 4096) (sleeping...)
kread(46, " # # I   I I S - D S E E".., 4096)   Err#82 ERESTART
    Received signal #14, SIGALRM [caught]
sigprocmask(2, 0x300BF840, 0x00000000)          = 0
incinterval(0, 0x2FF1A428, 0x2FF1A438)          = 0
sigprocmask(0, 0x00000000, 0x2FF1A420)          = 0
sigprocmask(2, 0xF0464790, 0x2FF1A380)          = 0
_sigaction(14, 0x2FF1A440, 0x2FF1A430)          = 0
thread_setmymask_fast(0x00000000, 0x00000000, 0x00000000, 0x103EE005, 0x00000000, 0x000000BC, 0x00000000, 0x00000000) = 0x00000000
appgettimer(9, 0x2FF1A500)                      = 0
sigprocmask(0, 0x00000000, 0x3020B004)          = 0
klseek(42, 0, 2048, 0x00000000)                 = 0
kread(42, "\0\0\t10\0\00110\0\0\f03".., 2048)   = 2048
sigprocmask(2, 0x3020B004, 0x00000000)          = 0
sigprocmask(0, 0x00000000, 0x2FF1A420)          = 0
sigprocmask(2, 0xF0464790, 0x2FF1A380)          = 0
_sigaction(14, 0x2FF1A440, 0x2FF1A430)          = 0
thread_setmymask_fast(0x00000000, 0x00000000, 0x00000000, 0x103EE005, 0x00000000, 0x000000B7, 0x00000000, 0x00000000) = 0x00000000
incinterval(0, 0x2FF1A428, 0x2FF1A438)          = 0
sigprocmask(0, 0x00000000, 0x300BF840)          = 0
kread(46, " # # I   I I S - D S E E".., 4096) (sleeping...)
kread(46, " # # I   I I S - D S E E".., 4096)   Err#82 ERESTART
    Received signal #14, SIGALRM [caught]
sigprocmask(2, 0x300BF840, 0x00000000)          = 0
incinterval(0, 0x2FF1A428, 0x2FF1A438)          = 0
sigprocmask(0, 0x00000000, 0x2FF1A420)          = 0
sigprocmask(2, 0xF0464790, 0x2FF1A380)          = 0
_sigaction(14, 0x2FF1A440, 0x2FF1A430)          = 0
thread_setmymask_fast(0x00000000, 0x00000000, 0x00000000, 0x103EE005, 0x00000000, 0x000000C1, 0x00000000, 0x00000000) = 0x00000000
appgettimer(9, 0x2FF1A500)                      = 0
sigprocmask(0, 0x00000000, 0x30273E64)          = 0
klseek(42, 0, 2048, 0x00000000)                 = 0
kread(42, "\0\0\t10\0\00110\0\0\f03".., 2048)   = 2048
sigprocmask(2, 0x30273E64, 0x00000000)          = 0
sigprocmask(0, 0x00000000, 0x2FF1A420)          = 0
sigprocmask(2, 0xF0464790, 0x2FF1A380)          = 0
_sigaction(14, 0x2FF1A440, 0x2FF1A430)          = 0
thread_setmymask_fast(0x00000000, 0x00000000, 0x00000000, 0x103EE005, 0x00000000, 0x000000C5, 0x00000000, 0x00000000) = 0x00000000
incinterval(0, 0x2FF1A428, 0x2FF1A438)          = 0
sigprocmask(0, 0x00000000, 0x300BF840)          = 0
^CPstatus: process is not stopped
What this job does?

It performs change capture and inserts the changed records to Oracle.

Generally this job takes around 1 minute (or 2seconds without any records)

I also tried truss -p on sqlldr process but
the out put was

Code: Select all

[iehibu12] /u01/iisGIDev truss -p 3063884
kread(0, 0x0000000000000000, 0) (sleeping...)
^CPstatus: process is not stopped
i think sqlldr is in sleep infinitely.

Please suggest the next step.

Regards,
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

In this case the truss output isn't of any help. I wonder if the job is not progressing because it is waiting for Oracle to complete? If you change the job to not write any changes to the database does it complete? Can you have your DBA monitor the DB while this job is running?
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

ArndW wrote:In this case the truss output isn't of any help. I wonder if the job is not progressing because it is waiting for Oracle to complete? If you change the job to not write any changes to the database does it complete? Can you have your DBA monitor the DB while this job is running?
I tried that job again it ran successfully.

According to my analysis
Only those jobs are getting hanged which are designed to write in to the database.

DBA was not monitoring the database while running that job but when the job got hanged i tried to consult the DBA.
After investigation he told that the Database it waiting for input but not getting it from the server.

But I was trying to run that job without any record so it should have finished in 2 or 3 seconds.

I am not sure where it got stucked.

That's why I am not sure that at which point it got stucked.

Can you tell me what check list to be followed now to find the point of failure.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
Post Reply