PX Job Hangs

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

PX Job Hangs

Post by satheesh_color »

Hi All,

I have a strange on my yesterday cycle.The PX job hungs for more than 4 hours.Normally it takes 30 minutes to finish.The same job runs past one year, we never faced this kind of problem..The job is in running status only for more than 4 hours.we are using TNG Unicenter to schedule the job by using Unix script.

Detail Spec:

DataStage:7.5A
Server:HP Unix
Parallel System:SMP
Configuration File: 8node configuration




OraStg1 OraStg2
| |

| |


| |

SRCOraStg ---> LookupStg1 ------>LookupStg2----->SortStg----->MaxAggStg----->Xfm1----->SecMaxAggStg----->Xfm2----->TrgOraStg


Sorry i can't able to figure it out correctly..Orastg1 is for LookupStg1 and OraStg2 is for LookupStg2.

DS Log:


Lookupstg1,0 :Input 0 consumed 12333333 records
Lookupstg1,2
Lookupstg1,3
Lookupstg1,4
Lookupstg1,5
Lookupstg1,6
Lookupstg1,7


Lookupstg2,0:input 0 consumed 987 records
Lookupstg2,2
Lookupstg2,3
Lookupstg2,4
Lookupstg2,5
Lookupstg2,6
Lookupstg2,7

Like this every stage has processed the parallel node 0 to 7.

The problem is SecMaxxAggStg:process parallel node:0,1,2,3,4,5,7 i.e: except 6
The problem is Xfm2:process parallel node:0,1,2,3,4,5,7 i.e: except 6
The problem is TrgOraStg:process parallel node:0,1,2,3,4,5,7 i.e: except 6


Except node 6 every other nodes are processed successfully...but the job is still in running status.


Action:Atlast we killed the job and rerun,at that time the job successfully completed.


I am not much familiar in PX jobs..so kindly throw some ideas on this.


Thanks&Regards,
Satheesh
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

The status might be showing as running. But have you checked the output for that job, was the target table been populated. There might be a chance where the status might not be shown with the updated information.
Or probably you noticed, the node 6 might have got some problem by the time when the job run, which might caused the job to hung. You can check if there was any disorder in the node 6 at the specific time.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

Post by satheesh_color »

Hi Kumar,

The records was not fully populated.The job hangs for almost 4 hours...normally it take 30 minutes..while rerunning the same job i can able to execute the job in 30 minutes..It's really painful to identify such cause....Kindly give me your suggestions for such odd things.


Thanks&Regards,
Satheesh
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

satheesh_color wrote: It's really painful to identify such cause....
Somethings just need to be done the hardway. :(

As Kumar suggested, since its the node 6 that is missing from action as per your log, you could checkout this node for any trouble - at that time.
satheesh_color
Participant
Posts: 182
Joined: Thu Jun 16, 2005 2:05 am

Post by satheesh_color »

Hi All,

What are the things need to be take care of such PX hangs?we missed the SLA's after long time due to the PX job hangs....So kindly suggest me the things to be taken care by the job running in Production.







Thanks&regards,
Satheesh
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi Sateesh,

Try changing your config file to skip the node 6 alone and check if you job finishes within expected time. So that you can nail down you problem.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply