Error calling subroutine: DSR_EXECJOB (Action=5)
Posted: Tue Jul 23, 2013 11:25 am
Hi
I am facing a strange problem while running a job.
Let me explain the process a little bit.
We have data in sql server that needs to be loaded into oracle. We load the following 3 tables -
Header (primary key H1, also has a column X1)
Detail_1 and (primary key D1, foreign key H1, also has column X1 and X2)
Detail_2 (primary key D2, foreign key D1, also has column X2)
In terms of cardinality, there is a one to one correspondence between each table. Total number of rows are roughly about 7 million. So it would be 7 million headers, 7 million detail_1 and 7 million detail_2.
There are about 8 jobs in the process. The first 4 jobs check if the batch is ready in SQL SERVER to be posted to oracle, dumps the sql server data out to datasets and runs some validations.
Once the validations are okay the 5th job loads the data into the header.
The second job loads the data into DETAIL_1 table.
This job has the following design -
dataset >>transformer (removes leading trailing space from key columns)
oracle stage (to read rows from header table)
The dataset retrieves 7 million detail rows and the oracle stage retrieves 7 million headers.
The output of the above two stages is fed to a join stage where they are inner joined on column X1 to get the corresponding value of H1. The output of the join goes to a transformer where some date related values are transformed for oracle and finally the output of the transformer is fed to an oracle stage which uses upsert to load the rows.
It is this job that after running for a while fails in a rather strange manner. It returns an error 30107 to the calling script. When it first starts executing i can see it as "Running" in the director. But once it errors out and i go into director It only shows "Compiled" ...the "Running" instance can no longer be seen. When i click on the job (which is showing as compiled) it throws an error in director saying -
Error calling subroutine: DSR_EXECJOB (Action=5); check DataStage is set up correctly in project PrepBatch
(Subroutine failed to complete successfully (30107))
FOLLOWED BY
Unable to return an attached job
But in the meanwhile - when i check the DETAIL_1 table in oracle, it has loaded all the 7 million records successfully.
So far i have tried the following -
1. Rebooted the machine
2. Deleted the entire folder that contains these jobs, reimported and recompiled.
3. reindexed the project - making sure that nobody else was logged in.
We have 8GB RAM and 16GB virtual memory. Before i start the process, vmstat command in AIX shows -
avm = 636105 and fre = 1291657
During the header load and detail_1 load job the vmstat output looks like -
avm = 752916 and fre = 6122
topas is showing paging space to be 100% available. We use a 4 node configuration and the scratch space for all 4 nodes is fully available (60GB each). The script mountpoint where the datasets etc are created has 40GB available.
I don't know why this is happening and what i can do to stop it. We handle 10 times more volume in rest of our jobs and they run fine.
I am facing a strange problem while running a job.
Let me explain the process a little bit.
We have data in sql server that needs to be loaded into oracle. We load the following 3 tables -
Header (primary key H1, also has a column X1)
Detail_1 and (primary key D1, foreign key H1, also has column X1 and X2)
Detail_2 (primary key D2, foreign key D1, also has column X2)
In terms of cardinality, there is a one to one correspondence between each table. Total number of rows are roughly about 7 million. So it would be 7 million headers, 7 million detail_1 and 7 million detail_2.
There are about 8 jobs in the process. The first 4 jobs check if the batch is ready in SQL SERVER to be posted to oracle, dumps the sql server data out to datasets and runs some validations.
Once the validations are okay the 5th job loads the data into the header.
The second job loads the data into DETAIL_1 table.
This job has the following design -
dataset >>transformer (removes leading trailing space from key columns)
oracle stage (to read rows from header table)
The dataset retrieves 7 million detail rows and the oracle stage retrieves 7 million headers.
The output of the above two stages is fed to a join stage where they are inner joined on column X1 to get the corresponding value of H1. The output of the join goes to a transformer where some date related values are transformed for oracle and finally the output of the transformer is fed to an oracle stage which uses upsert to load the rows.
It is this job that after running for a while fails in a rather strange manner. It returns an error 30107 to the calling script. When it first starts executing i can see it as "Running" in the director. But once it errors out and i go into director It only shows "Compiled" ...the "Running" instance can no longer be seen. When i click on the job (which is showing as compiled) it throws an error in director saying -
Error calling subroutine: DSR_EXECJOB (Action=5); check DataStage is set up correctly in project PrepBatch
(Subroutine failed to complete successfully (30107))
FOLLOWED BY
Unable to return an attached job
But in the meanwhile - when i check the DETAIL_1 table in oracle, it has loaded all the 7 million records successfully.
So far i have tried the following -
1. Rebooted the machine
2. Deleted the entire folder that contains these jobs, reimported and recompiled.
3. reindexed the project - making sure that nobody else was logged in.
We have 8GB RAM and 16GB virtual memory. Before i start the process, vmstat command in AIX shows -
avm = 636105 and fre = 1291657
During the header load and detail_1 load job the vmstat output looks like -
avm = 752916 and fre = 6122
topas is showing paging space to be 100% available. We use a 4 node configuration and the scratch space for all 4 nodes is fully available (60GB each). The script mountpoint where the datasets etc are created has 40GB available.
I don't know why this is happening and what i can do to stop it. We handle 10 times more volume in rest of our jobs and they run fine.