Page 1 of 1

PX Job Hangs

Posted: Thu Dec 14, 2006 9:01 pm
by satheesh_color
Hi All,

I have a strange on my yesterday cycle.The PX job hungs for more than 4 hours.Normally it takes 30 minutes to finish.The same job runs past one year, we never faced this kind of problem..The job is in running status only for more than 4 hours.we are using TNG Unicenter to schedule the job by using Unix script.

Detail Spec:

DataStage:7.5A
Server:HP Unix
Parallel System:SMP
Configuration File: 8node configuration




OraStg1 OraStg2
| |

| |


| |

SRCOraStg ---> LookupStg1 ------>LookupStg2----->SortStg----->MaxAggStg----->Xfm1----->SecMaxAggStg----->Xfm2----->TrgOraStg


Sorry i can't able to figure it out correctly..Orastg1 is for LookupStg1 and OraStg2 is for LookupStg2.

DS Log:


Lookupstg1,0 :Input 0 consumed 12333333 records
Lookupstg1,2
Lookupstg1,3
Lookupstg1,4
Lookupstg1,5
Lookupstg1,6
Lookupstg1,7


Lookupstg2,0:input 0 consumed 987 records
Lookupstg2,2
Lookupstg2,3
Lookupstg2,4
Lookupstg2,5
Lookupstg2,6
Lookupstg2,7

Like this every stage has processed the parallel node 0 to 7.

The problem is SecMaxxAggStg:process parallel node:0,1,2,3,4,5,7 i.e: except 6
The problem is Xfm2:process parallel node:0,1,2,3,4,5,7 i.e: except 6
The problem is TrgOraStg:process parallel node:0,1,2,3,4,5,7 i.e: except 6


Except node 6 every other nodes are processed successfully...but the job is still in running status.


Action:Atlast we killed the job and rerun,at that time the job successfully completed.


I am not much familiar in PX jobs..so kindly throw some ideas on this.


Thanks&Regards,
Satheesh

Posted: Thu Dec 14, 2006 9:18 pm
by kumar_s
The status might be showing as running. But have you checked the output for that job, was the target table been populated. There might be a chance where the status might not be shown with the updated information.
Or probably you noticed, the node 6 might have got some problem by the time when the job run, which might caused the job to hung. You can check if there was any disorder in the node 6 at the specific time.

Posted: Sat Dec 16, 2006 1:37 am
by satheesh_color
Hi Kumar,

The records was not fully populated.The job hangs for almost 4 hours...normally it take 30 minutes..while rerunning the same job i can able to execute the job in 30 minutes..It's really painful to identify such cause....Kindly give me your suggestions for such odd things.


Thanks&Regards,
Satheesh

Posted: Sat Dec 16, 2006 9:39 am
by thebird
satheesh_color wrote: It's really painful to identify such cause....
Somethings just need to be done the hardway. :(

As Kumar suggested, since its the node 6 that is missing from action as per your log, you could checkout this node for any trouble - at that time.

Posted: Wed Dec 20, 2006 12:48 am
by satheesh_color
Hi All,

What are the things need to be take care of such PX hangs?we missed the SLA's after long time due to the PX job hangs....So kindly suggest me the things to be taken care by the job running in Production.







Thanks&regards,
Satheesh

Posted: Wed Dec 20, 2006 1:27 am
by kumar_s
Hi Sateesh,

Try changing your config file to skip the node 6 alone and check if you job finishes within expected time. So that you can nail down you problem.