Page 1 of 1

Job abending after 16 million odd records

Posted: Mon Sep 27, 2004 6:38 pm
by Neil C
I have two different (but similar) DS jobs that read data from Oracle 9i (9.2.0.5), perform 3 hash file lookups and 3 Oracle lookups, and write (INSERT) rows into a table on a second Oracle instance.

My problem is that after ABOUT 16M rows (16,239,271 for one, 16,497,919 for the other) the job abends with the following error message. The second job has failed twice with a different number of rows the first time (that I did not write down).

Project:VECCI_PRD (thyme.saturn.co.nz)
Job name:SorToDmCSTCALLSUMArborRated
Event #:230
Timestamp:24/09/04 10:29:52
Event type:Warning
User:veprd
Message:
Abnormal termination of stage SorToDmCSTCALLSUMArborRated..TDW_CST_CALL_SUM_TF detected

Output Rows per Transaction is set to 99,999, Read Committed, Insert Rows without clearing.

Input is using Read Commited, Aray size = 1000.

There are no logs or alerts on the corresponding Oracle systems.

Is there a limit to how many rows a DS job may process, or is there a memory leak somewhere perhaps?

Posted: Mon Sep 27, 2004 8:16 pm
by kcbland
My guess would be that your runtime is so obscene (hours) that your connection is timing out. 16 million rows, this job must be riding on only 1 cpu, waiting forever on the OCI lookups, I'd guess you're runtime is somewhere over 8 hours without even knowing much about what you're doing.

Unless I'm totally wrong on my guesses, your solution will be to optimize your process by breaking it down into multiple jobs. One job to high-performance, multi-instance with a ranged parameterized query extract out from your source the data set to process. Another job to high-performance, multi-instance transform using a partitioning constraint in a transformer to divide up the source data set and utilize all cpus available. Lastly, since you're doing INSERTS only, a sqlldr command line execution using DIRECT path loading mode to slam in the rows.

Posted: Mon Sep 27, 2004 8:29 pm
by Neil C
Kenneth, yes to all of that. Run time was 15 and 20 hours. I have modified the jobs to select only those rows that did not get processed first time round into a sequential file, then to process these. One still had 3.5M to go, the other only 1.9M so I should be able to get them run in a reasonable time.

Then the whole process is going to redesigned to process that data daily, instead of monthly (I did not design the process originally!) so this problem should go away eventually. It will still perform like a dog, but will run, hopefully.

If it did time out, is this logged somewhere in DataStage?

Posted: Mon Sep 27, 2004 8:45 pm
by kcbland
It should have been, but who knows? Your problem, I'm 99% sure, is that you had a database connection open waaaay too long, probably were dealing with either rollback segments on the load query, snapshot segments on the source query, a whole host of issues.

This folks, is why I recommend getting the data as quickly out of the source and into files!!! You don't have to deal with hanging a query on the source!!!

It's not even worth trying to make this pig fly. By breaking down the design into smaller, modular, faster jobs, you'll remove most, if not all, of these issues.

Posted: Mon Sep 27, 2004 9:47 pm
by ray.wurlod
There's also been some recent - like in the past week or two - posting activity about an Oracle patch needed to prevent this (memory leak) kind of thing when 32-bit clients (which DataStage presently must use) and 64-bit Oracle. Might be worth chasing that up. Search the forum for "memory leak" and/or "Oracle" and "patch".

Posted: Mon Sep 27, 2004 10:11 pm
by chulett
Unless they are running on Tru64. :wink:

Posted: Mon Sep 27, 2004 10:22 pm
by Neil C
Thanks Ray. Found the note and passed it on to our DBA.