Page 1 of 2

Parallel job hanging on oracle SQLloader ?

Posted: Thu Sep 16, 2010 12:30 am
by wbeitler
Morning,

we're running a very simple parallel job, writing to an oracle table using the Load-method. We've got 2 nodes defined in the APT-config. For both nodes we receive the 'Export complete' message along with a rowcount. At times (by the look of it rather random) we're receiving just 1 'Load completed' message and the other node seems to be 'stuck' without any (error-)message. Just waiting in vain to report back the other Load completed message...

SQL-loader situation:
- the 'par' and 'ctl'-files are still in the scratch folder (unexpected)
- no bad-file is produced (as expected)
- log-file is complete and contains no errors (as expected)

Nothing strange in the db2diag.log either. So...
Any other place to look for the possible reason?
Or even better any solutions for this unpredictable behavior? :?

thanks again,

William

Posted: Thu Sep 16, 2010 3:31 am
by priyadarshikunal
Turn the operator combination off. And try looking in to oracle trace whether its oracle creating problems. Also, whats the APT_ORACLE_LOAD_OPTIONS environment variable setting.

Do you have any index rebuild or maintenance specified if using Oracle enterprise stage.

Re: Parallel job hanging on oracle SQLloader ?

Posted: Thu Sep 16, 2010 4:04 am
by wbeitler
Hi,

- APT_ORACLE_LOAD_OPTIONS isn't set explicitly, hence we're running with:

OPTIONS(DIRECT=TRUE, PARALLEL=TRUE, SKIP_INDEX_MAINTENANCE=YES)

- Oracle-trace doesn't show any problems.
- No indexes (or rebuilds) on the table yet. So index-maintenance options not set

Turned the operator combination off and currently rerunning.
But since the problem occures ' once in a while' not sure if we ' Catch' it this time...

thanks for your time and effort,

William

Posted: Thu Sep 16, 2010 4:21 am
by priyadarshikunal
I have seen problems like this in 8.0 release and turning operator combination off solved it for us. Didn't really got a satisfactory answer from IBM and since it got solved, haven't really pushed them either.

Posted: Thu Sep 16, 2010 6:51 am
by teddycarebears
What if it is a database problem and a deadlock occurs? Have you searched your tables if they are used in that very moments by another job or process ?

Posted: Thu Sep 16, 2010 7:13 am
by priyadarshikunal
in case of any deadlock, the trace file should have the details and since OP mentioned there is none, i believe its ok.

BTW, skip maintenance is not the default load option, for DIRECT=TRUE datatstage expects the index to be rebuilt or maintained unless SKIP MAINTENANCE is explicity specified. IMO.

Posted: Thu Sep 16, 2010 7:15 am
by wbeitler
Checked that. No Oracle processes running for that table when the job seemed to be hanging.
Have been constantly looping the 75+ jobs in which the error sometimes occured with the APT_DISABLE_COMBINATION set. No luck in trapping the error again so far...

Posted: Thu Sep 16, 2010 7:20 am
by priyadarshikunal
what version of datastage are you using? Have you checked if the process is still running on server machine for that job?

Posted: Thu Sep 16, 2010 7:29 am
by wbeitler
We're on 8.1 fixpack 1. No direct access on the ETL-server unfortunately, so couldn't check myself. But sure will get support to find out about serverside processes (if I get it to ' hang' again...)

William

Posted: Thu Sep 16, 2010 7:32 am
by wbeitler
wbeitler wrote:We're on 8.1 fixpack
ehhhrrr 8.1.2 that is... :o

Posted: Mon Sep 20, 2010 1:55 am
by wbeitler
Jobs hanging again. Datastage-processes still up, but not running (no changes in CPU or memory allocation). No DB-processes and no usefull information in the logs after having ran with APT_DISABLE_COMBINATION.
Just the job waiting there for the 'Load completed message' from the second node... Any other clues?

Posted: Mon Sep 20, 2010 6:06 am
by chulett
Have you involved your official support provider yet?

Posted: Mon Sep 20, 2010 7:29 am
by wbeitler
Nope, since DSXchange has beaten them in the past more than once... :roll: But you're right. Will give them a fair chance to prove otherwise... In the meantime, still open for suggestions...

Posted: Wed Sep 29, 2010 10:39 am
by wbeitler
Please tell me it can't be a lock on the APT configuration file when several jobs try to access it simultaneously ?! :roll:

Posted: Wed Sep 29, 2010 1:29 pm
by chulett
It can't be a lock on the APT configuration file when several jobs try to access it simultaneously.