Page 1 of 2

Abnormal termination of transformer stage

Posted: Wed Dec 05, 2007 3:56 am
by saikrishna
Hi

When I run a job which selects records from an oracle table and inserts to oracle table, I am getting the following warning, and then job is aborting.

Abnormal termination of stage push_user_table_10pct_m..xfm_stage detected



The design of the job is as follows:

ORACLE Stage -> Transformer ->Oracle Stage

In transformer, I am doing a direct one to one mapping from source stage target stage.

Why this error came ? What could be the solution for it?

FYI: The number of records in source table is : 28499271

Thanks
Sai

Posted: Wed Dec 05, 2007 5:03 am
by saikrishna
I am getting the follwoing message in the next run

From previous run
DataStage Job 413 Phantom 3117
Abnormal termination of DataStage.
Fault type is 11. Layer type is BASIC run machine.
Fault occurred in BASIC program JOB.1133835455.DT.1454325365.TRANS1 at address 2fc.
CRITICAL ERROR! Notify the system administrator.



Thanks
Sai

Posted: Wed Dec 05, 2007 8:23 am
by chulett
First suggestion would be to search for your error message "Fault type is 11" and see if that helps.

Posted: Wed Dec 05, 2007 10:24 am
by saikrishna
I have searched the forum on this...

But could not resolve my issue...
My
I tried running my job for 1 lack records, it worked fine.

When I try to run 28 million records, I am getting this problem at around 12 -13 million records.

I have removed the target DB stage, and replaced with sequential file, but stil the problem is not resolved. So, I feel the problem is not with the target db stage.

My job design involves passive-active-passive connection, So there is no active-active link is there..So, I didnt enable in-process or inter-process row bufferings..

FYI: The same job ran successfully in our old server, version 7.5 on a different machine. We migrated the jobs from that server to our new server, i.e. 7.5.2 on a different machine.

Any clue with this info?

Thanks
Sai

Posted: Wed Dec 05, 2007 10:28 am
by saikrishna
More info about the comparison of old server and new server:
1. In our new server NLS is installed, old server no NLS installation
2. I have compared uvconfig files in the two servers, I found the following difference:

< GLTABSZ 130
< GLTABSIZE 130
---
> GLTABSZ 120
214,215c213
< RLTABSZ 130
< RLTABSIZE 130
---
> RLTABSZ 120
287c285
< MAXRLOCK 129
---
> MAXRLOCK 119
554c552
< DMEMOFF 0xbdfd3000
---
> DMEMOFF 0x88b8000
563c561
< PMEMOFF 0xbf431000
---
> PMEMOFF 0x61a8000
572c570
< CMEMOFF 0xbf434000
---
> CMEMOFF 0x6b6c000
> CMEMOFF 0x6b6c000
581c579
< NMEMOFF 0xbf635000
---
> NMEMOFF 0x62a2000
618,796d615




And in our newserver uvconfig file, it contains SOME NLS parameters.


Thanks
Sai

Posted: Wed Dec 05, 2007 10:30 am
by ArndW
If you add a constraint "1=2" to your job does it still abort? Does it abort at the same row number every time? If you drop a few columns from the source and don't write anything to the output, does it still fail and does it fail in the same place? Basically the goal is to make simple changes to the job until the error goes away, and then narrow down the cause.

Posted: Wed Dec 05, 2007 11:59 am
by saikrishna
Hi ArndW

FYI: The job is not aborting at the same row number.

I will implement 1=2 and the other options you provided.

One thing I want to tell you is It is taking around 2 hours to run this job and then it is aborting... So, I need to wait for 2 hours, if I do any experimentation..

Thanks
Sai

Posted: Thu Dec 06, 2007 2:03 am
by saikrishna
Hi

a. The job with 1=2 condition, is aborted at in the middle at around 13 million records. (Overall number of records: 28 million)

b. The job which directly loads from OCI stage to OCI Stage, also aborted.
(Without transformer)

Any other solutions?

Thanks
Sai

Posted: Thu Dec 06, 2007 3:23 am
by ray.wurlod
What is the transaction size and array size in the target OCI stage? Is the Oracle database exhausting some resource, such as the size of the rollback segment? (Ask your Oracle DBA to check while this job is running.)

Posted: Thu Dec 06, 2007 4:13 am
by saikrishna
Hi

Transaction size is 0, Array size is 30000.
We have put nologging in the target table, to which we need to load. So , there is no rollback segment will be generated.

Even, We have tried, replacing target OCI stage with a sequential file stage and rant the job. but still we are getting similar problem. With this experiment, I thought that the problem is not there in database.

Let me know if you have any other solution

Thanks
Sai

Posted: Thu Dec 06, 2007 4:36 am
by saikir
Hi,

We had experienced a very similar problem in our project some time ago. I remember changing the array size to zero and the job working fine. Try changing the array size zero?

Sai

Posted: Thu Dec 06, 2007 5:14 am
by ArndW
You've now narrowed down the problem to your source stage and transform stage. You can now try a job which doesn't use an explicit transform stage and just go from source -> sequential file. Does the error persist? If you remove all the columns apart from the key and some small data column does the error persist?

Posted: Thu Dec 06, 2007 5:43 am
by saikrishna
The main hindrance to the experimentation now is...

Each run is taking around 2 hours to give the result... So, Advice is taken ...but I will see whether that time is available for now or not.. as I am in production, I have to load ...without delay...

Thanks
Sai

Posted: Thu Dec 06, 2007 6:25 am
by saikrishna
The job which loads from Oracle stage to direct sequential file stage is also aborted at around 13 million records.


Thanks
Sai

Posted: Thu Dec 06, 2007 7:15 am
by srinagesh
are there any restrictions on your UNIX server that an process cannot remain active for more than 1 hour ?

I may be wrong in my thinking, but I believe that the error you are getting is due to the UNIX process. It is getting killed.

Can you do ps-ef|grep phantom and search for the Job that is getting aborted and see if there are any processes ??