Job abending after 16 million odd records

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Neil C
Participant
Posts: 46
Joined: Sun Apr 06, 2003 8:59 pm
Location: Auckland, New Zealand

Job abending after 16 million odd records

Post by Neil C »

I have two different (but similar) DS jobs that read data from Oracle 9i (9.2.0.5), perform 3 hash file lookups and 3 Oracle lookups, and write (INSERT) rows into a table on a second Oracle instance.

My problem is that after ABOUT 16M rows (16,239,271 for one, 16,497,919 for the other) the job abends with the following error message. The second job has failed twice with a different number of rows the first time (that I did not write down).

Project:VECCI_PRD (thyme.saturn.co.nz)
Job name:SorToDmCSTCALLSUMArborRated
Event #:230
Timestamp:24/09/04 10:29:52
Event type:Warning
User:veprd
Message:
Abnormal termination of stage SorToDmCSTCALLSUMArborRated..TDW_CST_CALL_SUM_TF detected

Output Rows per Transaction is set to 99,999, Read Committed, Insert Rows without clearing.

Input is using Read Commited, Aray size = 1000.

There are no logs or alerts on the corresponding Oracle systems.

Is there a limit to how many rows a DS job may process, or is there a memory leak somewhere perhaps?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

My guess would be that your runtime is so obscene (hours) that your connection is timing out. 16 million rows, this job must be riding on only 1 cpu, waiting forever on the OCI lookups, I'd guess you're runtime is somewhere over 8 hours without even knowing much about what you're doing.

Unless I'm totally wrong on my guesses, your solution will be to optimize your process by breaking it down into multiple jobs. One job to high-performance, multi-instance with a ranged parameterized query extract out from your source the data set to process. Another job to high-performance, multi-instance transform using a partitioning constraint in a transformer to divide up the source data set and utilize all cpus available. Lastly, since you're doing INSERTS only, a sqlldr command line execution using DIRECT path loading mode to slam in the rows.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Neil C
Participant
Posts: 46
Joined: Sun Apr 06, 2003 8:59 pm
Location: Auckland, New Zealand

Post by Neil C »

Kenneth, yes to all of that. Run time was 15 and 20 hours. I have modified the jobs to select only those rows that did not get processed first time round into a sequential file, then to process these. One still had 3.5M to go, the other only 1.9M so I should be able to get them run in a reasonable time.

Then the whole process is going to redesigned to process that data daily, instead of monthly (I did not design the process originally!) so this problem should go away eventually. It will still perform like a dog, but will run, hopefully.

If it did time out, is this logged somewhere in DataStage?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

It should have been, but who knows? Your problem, I'm 99% sure, is that you had a database connection open waaaay too long, probably were dealing with either rollback segments on the load query, snapshot segments on the source query, a whole host of issues.

This folks, is why I recommend getting the data as quickly out of the source and into files!!! You don't have to deal with hanging a query on the source!!!

It's not even worth trying to make this pig fly. By breaking down the design into smaller, modular, faster jobs, you'll remove most, if not all, of these issues.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There's also been some recent - like in the past week or two - posting activity about an Oracle patch needed to prevent this (memory leak) kind of thing when 32-bit clients (which DataStage presently must use) and 64-bit Oracle. Might be worth chasing that up. Search the forum for "memory leak" and/or "Oracle" and "patch".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Unless they are running on Tru64. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Neil C
Participant
Posts: 46
Joined: Sun Apr 06, 2003 8:59 pm
Location: Auckland, New Zealand

Post by Neil C »

Thanks Ray. Found the note and passed it on to our DBA.
Post Reply