Abnormal Termination

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Bala
Participant
Posts: 17
Joined: Mon Oct 14, 2002 8:05 pm

Abnormal Termination

Post by Bala »

I have this problem for last few days. ETL jobs are failing without any error message. Re-run of the jobs are resulted in abnormal termination at different job. I tried to re-run the jobs after "Cleanup" the project, resources and &PH& directory, but no use.
The following is the error message in the &PH& directory when the job was aborted abnormally.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 161, Abort.
Attempting to Cleanup after ABORT raised in stage CDWMainScheduler.JobControl

DataStage Phantom Aborting with @ABORT.CODE = 1
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

If anyone had this experience before and resolved the problem, please share your knowledge.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

@ABORT.CODE = 1 means that an ABORT statement has been executed, which almost certainly means that your code has invoked a DSLogFatal function. You will need to insert debugging (DSLogInfo) statements in your job control code to determine exactly where this is occurring.
Bala
Participant
Posts: 17
Joined: Mon Oct 14, 2002 8:05 pm

Post by Bala »

Thanks for the reply.
Actually, the CDWMainScheduler is the main job which calls many (child)jobs.The attached message in my prev mail is the error message from CDWMainScheduler job when one of its child job got aborted abnormally without any error message.Is it because of any log file(??) that is full? If it is so, how to clear the job or is there any remedy.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Log files can grow to 2GB; it's highly unlikely the cause is a full log file (though log files with lots of entries can degrade performance).
Have you identified which child job aborted and, if so, is this consistent (is it always this job that fails)? Look carefully at the "job starting" message for this child job; do the parameters all have legitimate values, or are you seeing negative integers (which tends to indicate failures of DSGetParamInfo or DSSetParam in the controlling job)?
You really are going to have to exercise your diagnostic skills here, and narrow down the cause of failure. This will include verifying that the child job runs satisfactorily in isolation, checking (view their values in the log, or include diagnostic calls to DSLogInfo) that the parent job is accurately passing parameter values to the child job, and so on.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In order to pre-empt those who would point out that the abort message was generated by the internal routine DSD.WriteLog, I would make the point that this is a VERY UNHAPPY routine if passed an invalid job handle, which explains my initial concentration on parameter values.
Bala
Participant
Posts: 17
Joined: Mon Oct 14, 2002 8:05 pm

Post by Bala »

Re-run of that etl job was successful. But, in the next run, another other still is in "Running" status, but monitor shows "Finished" for all the links. What could be the reason for this. Please advice if anyone knows.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Separate topic - starting new thread. Topic: Job status "running" when all stages "finished"
Bala
Participant
Posts: 17
Joined: Mon Oct 14, 2002 8:05 pm

Post by Bala »

Ray,
Do you have any idea about RT_LOG246.
Yesterday, a job was started and then immediatly aborted abnormally. I noticed the lock RT_LOG246 when I ran the command LIST.READU ALL in the universe environment. So, I would like to know the answers for the following questions.
1. What kind of lock it is and what could be the possible reason(s) for this lock.
2. How to release this lock.
3. How to prevent this lock in future.
Please advice.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

RT_LOG246 is the table containing the job log for job number 246.
The type of lock on it is "RL", which is a shared read lock, indicating that it is being viewed by the Director client.
Closing the Director client will release this lock.
You may occasionally see a "RU" lock, which means that the job is running and actually writing to the log at that moment. When the write is complete, this lock is released automatically.
constc
Participant
Posts: 4
Joined: Sun Dec 15, 2002 7:03 pm

Post by constc »

Can you elaborate more on the jobs that gave you this problem? Are you loading into a database? I have seen this problem before but for that case, it was due to memory leak in the database client (OCI Client).

This "Abnormal Termination" error or job hang error can be caused by numerous factors. Some areas to check: the network bottleneck, database client, and like Ray proposed, your log file or temp file directory (system resources).

I'd strongly recommend that you highlight this issue to Ascential Support. I believe they have had a few cases of the same issue reported to them before.

Hope this helps
constc
Participant
Posts: 4
Joined: Sun Dec 15, 2002 7:03 pm

Post by constc »

Can you elaborate more on the jobs that gave you this problem? Are you loading into a database? I have seen this problem before but for that case, it was due to memory leak in the database client (OCI Client).

This "Abnormal Termination" error or job lock error can be caused by numerous factors. Some areas to check: the network bottleneck, database client, and like Ray proposed, your log file or temp file directory (system resources).

I'd strongly recommend that you highlight this issue to Ascential Support. I believe they have had a few cases of the same issue reported to them before.

Hope this helps
Post Reply