PX Job Hangs

satheesh_color · Post by **satheesh_color** » Thu Dec 14, 2006 9:01 pm

Hi All,

I have a strange on my yesterday cycle.The PX job hungs for more than 4 hours.Normally it takes 30 minutes to finish.The same job runs past one year, we never faced this kind of problem..The job is in running status only for more than 4 hours.we are using TNG Unicenter to schedule the job by using Unix script.

Detail Spec:

DataStage:7.5A
Server:HP Unix
Parallel System:SMP
Configuration File: 8node configuration

OraStg1 OraStg2
| |

| |

| |

SRCOraStg ---> LookupStg1 ------>LookupStg2----->SortStg----->MaxAggStg----->Xfm1----->SecMaxAggStg----->Xfm2----->TrgOraStg

Sorry i can't able to figure it out correctly..Orastg1 is for LookupStg1 and OraStg2 is for LookupStg2.

DS Log:

Lookupstg1,0 :Input 0 consumed 12333333 records
Lookupstg1,2
Lookupstg1,3
Lookupstg1,4
Lookupstg1,5
Lookupstg1,6
Lookupstg1,7

Lookupstg2,0:input 0 consumed 987 records
Lookupstg2,2
Lookupstg2,3
Lookupstg2,4
Lookupstg2,5
Lookupstg2,6
Lookupstg2,7

Like this every stage has processed the parallel node 0 to 7.

The problem is SecMaxxAggStg:process parallel node:0,1,2,3,4,5,7 i.e: except 6
The problem is Xfm2:process parallel node:0,1,2,3,4,5,7 i.e: except 6
The problem is TrgOraStg:process parallel node:0,1,2,3,4,5,7 i.e: except 6

Except node 6 every other nodes are processed successfully...but the job is still in running status.

Action:Atlast we killed the job and rerun,at that time the job successfully completed.

I am not much familiar in PX jobs..so kindly throw some ideas on this.

Thanks&Regards,
Satheesh

kumar_s · Post by **kumar_s** » Thu Dec 14, 2006 9:18 pm

The status might be showing as running. But have you checked the output for that job, was the target table been populated. There might be a chance where the status might not be shown with the updated information.
Or probably you noticed, the node 6 might have got some problem by the time when the job run, which might caused the job to hung. You can check if there was any disorder in the node 6 at the specific time.

satheesh_color · Post by **satheesh_color** » Sat Dec 16, 2006 1:37 am

Hi Kumar,

The records was not fully populated.The job hangs for almost 4 hours...normally it take 30 minutes..while rerunning the same job i can able to execute the job in 30 minutes..It's really painful to identify such cause....Kindly give me your suggestions for such odd things.

Thanks&Regards,
Satheesh

thebird · Post by **thebird** » Sat Dec 16, 2006 9:39 am

satheesh_color wrote: It's really painful to identify such cause....

Somethings just need to be done the hardway.

As Kumar suggested, since its the node 6 that is missing from action as per your log, you could checkout this node for any trouble - at that time.

satheesh_color · Post by **satheesh_color** » Wed Dec 20, 2006 12:48 am

Hi All,

What are the things need to be take care of such PX hangs?we missed the SLA's after long time due to the PX job hangs....So kindly suggest me the things to be taken care by the job running in Production.

Thanks&regards,
Satheesh

kumar_s · Post by **kumar_s** » Wed Dec 20, 2006 1:27 am

Hi Sateesh,

Try changing your config file to skip the node 6 alone and check if you job finishes within expected time. So that you can nail down you problem.