Abnormal termination of stage (Solved)

datastagedummy · Post by **datastagedummy** » Tue Aug 05, 2003 3:31 pm

I have a job that processees 345889 rows and aborts at around 270000 rows with the message

--------------------------------------------------------------------
Message from DataStage Director:

71: xGDW00BocFeedActBudMtrlBillCost..AgGDDWBudMatBillCost: %s
72: Abnormal termination of stage xGDW00BocFeedActBudMtrlBillCost..CleanseMtrlNumber detected
73: Job xGDW00BocFeedActBudMtrlBillCost aborted.
--------------------------------------------------------------------
I am using DataStage 6.0

What does the "%s" means dont know how to decipher the cryptic messages.

I tried to split the data by running the same job for all the different System id's that we have (ran the same job 35 times) and it processed all the 35 system id's hence doesn't looks like a data issue.

I am sourcing from Oracle using ORAOCI9 stage and after a couple of lookups and aggregation writting to a sequential file.

HELP Me, thanks in advance

SOLUTION:
----------------------------------------------------------------------
Thanks a lot to all of you for your responses. Due to the cryptic message "%s" we were not able to realise what exactly the problem was the general concensus was that this was a diskspace problem not really this was not a disk space problem but a memory problem as this job had a lot of hash files lookups on some huge files which it was trying to preload into memory and hence running out of memory space.

Wish DataStage gave better message like "Out of memory space" rather than a %s.

Thanks once again to all of you.

chulett · Post by **chulett** » Tue Aug 05, 2003 3:54 pm

The '%s' is simply a place-holder for a string that it hasn't (for some reason) been populated correctly.

Do you have the latest version for your platform? Have you checked for additional information in your '&PH&' directory? Do you have access to Support? I'd probably log this one with them for resolution, as it looks a little... 'buggy'.

-craig

datastagedummy · Post by **datastagedummy** » Tue Aug 05, 2003 3:56 pm

Craig what do I look for in &PH& ??

chulett · Post by **chulett** » Tue Aug 05, 2003 5:09 pm

Well... you want to look for entries from the time your job aborted and see if there is any additional information there.

For example you'll see DSD.RUN files, DSD.Stagerun files and sometimes 'trace' files. They are just text files that can be opened/viewed like any other text file. Sometimes helpful information is buried in there you can use to troubleshoot issues, sometimes not.

-craig

datastagedummy · Post by **datastagedummy** » Tue Aug 05, 2003 5:22 pm

When I reset the job using director I get the following message

---------------------------------------------------------------------
From previous run
DataStage Job 2129 Phantom 28311
Abnormal termination of DataStage.
Fault type is 11. Layer type is BASIC run machine.
Fault occurred in BASIC program DSP.ActiveRun at address f7d4dd40

---------------------------------------------------------------------

Is there something hidden in this message ? Which DataStagedummy cannot see [V]

kcbland · Post by **kcbland** » Tue Aug 05, 2003 6:44 pm

You're probably getting into a space issue. Since you didn't describe the job design we don't know the stages in your job. My guess would be that your aggregator stage is running out of space somewhere. You're also in that magic area where, if you haven't pre-sorted the aggregation data and told the aggregator it's sorted, data weird things happen.

You don't have a lot of data, I'd sort it for grins and then aggregate it. It would probably take less time to sort then Aggregate, rather than let the "Aggravator" stage grunt through unsorted data. In addition, it doesn't have to act more like a passive stage and use up disk space as it temporarily parks the data. It can pass through the groups as they break.

Kenneth Bland

chulett · Post by **chulett** » Wed Aug 06, 2003 7:11 am

As Ken mentions, it is important to understand that the Aggregator - unless presented with sorted data and *told* of that fact - may become a choke-point in the job as it gathers up *all* of your data records and sorts them (possibly again!) to enable the grouping it needs to do and before starting to output records. This can take a wee bit of temp/sort space sometimes as well. [;)]

-craig

datastagedummy · Post by **datastagedummy** » Thu Aug 07, 2003 9:11 am

Ken & Craig thanks a lot for your responses but the dummy has one more question how do i *tell* the aggregator stage that the input data is *SORTED*

tonystark622 · Post by **tonystark622** » Thu Aug 07, 2003 10:20 am

In the Columns tab on the Input tab of the Aggregator stage you should see a column in the grid marked Sort. Put a 1,2,3,4, etc in this field to designate the sort order.

hope this helps,
Tony

spracht · Post by **spracht** » Thu Aug 07, 2003 12:49 pm

Ken

you assume that the Job / the Server is running out of space 'somewhere'. Are you serious? When you start to learn a programming language, you're usually taught to ensure that you have the resource you want to use, otherwise issue a warning and terminate, if necessary.

If a DataStage server can't get the memory it demands from the operating system, why is it not possible to tell that. If there is no more disk space, what does DS prevent from conceding: OK boys, i'd really like to do what you want me to, but if you don't have diskspace anymore, I can't?

Stephan

kcbland · Post by **kcbland** » Thu Aug 07, 2003 2:46 pm

spracht, I don't know if you're serious about your question or are you making fun of me?

If you're seriously asking the questions:

Will a DataStage job bomb when it runs out of disk space? The answer is YES.

Will it bomb in such as way as to be difficult to reset? The answer is YES.

Will it bomb in such a way so that you have to do extreme things to reset it to a runnable state? The answer is YES.

Don't take my word for it. Go to Director --> Job --> Clear Status File. That's there because these things DO HAPPEN. Do a "ps -ef|grep phantom" and see your job theads executing. Sometimes, a job blows up and leaves a thread out there. You can see those easily because the ppid is 1. You WILL NOT be able to run that job properly again until that thread is killed. That thread will screw up your job.

How about this one, this is my favorite. Install DataStage (NT or Unix, doesn't matter) and put your projects on one file system. Get a bunch of people to be designing or running jobs on all of the projects at the same time. Then, do something to run that filesystem out of space (like have your hash files on the same file system as the projects). BLAMMO, all of your projects are corrupted and unrecoverable. You've "blinked" all of the project hash files that were open and being worked in (job logs, status files, the one hash file in all projects that contains all job design information, etc).

Okay, so if you're just poking fun at me: Programmers are programmers. You know how good your team is, so why does everyone expect software engineers to be superman? Microsoft has billions of dollars and thousands of engineers, and they still put out stupid code. So why should anyone else be different? I've probably got 50 typos and grammatical errors in this posting alone.

Kenneth Bland

mhester · Post by **mhester** » Thu Aug 07, 2003 3:23 pm

There are two (2) separate issues logged with Ascential that may be of interest to you regarding this issue.

GTAR - G37114
Reported - 1/30/2002
Release - 4.1.1

Indicates that the problem has happened to others and does not appear to be space related. There has not been a solution for this issue, although another customer ran into a similar problem and that GTAR also has not been resolved or at least there is no workaround.

You are using the 9i version of the OCI stage - did you try the ORAOCI8 stage? Maybe give that a try.

I don't know if there is enough information to point to the aggregator or the source stage. Both issues submitted to Ascential focused on Oracle and not the aggregator stage (only one issue implemented with aggregation).

Try changing to ODBC even though that would be slower - maybe you can rule out the source or aggregation.

Try not to focus too long on the current configuration, rather you should change the design slightly to help eliminate possible points of failure.

Let us know,

Michael Hester

spracht · Post by **spracht** » Fri Aug 08, 2003 11:24 pm

Ken

I must admit that my English reaches its limits here. I was able to fancy what AFAIK or IMHO is , but BLAMMO [?][?][?]

Stephan

chulett · Post by **chulett** » Sat Aug 09, 2003 7:49 am

[:D] BLAMMO is not an acronym, it's the sound of a large explosion.

-craig

ray.wurlod · Post by **ray.wurlod** » Thu Aug 14, 2003 4:51 pm

The Aggregator stage does not request all the memory it requires up front, because it can not know a priori how many distinct values it will need to group. Instead, it allocates an initial space (default 8KB), then increments this by a small multiplier (default 2) until a particular threshhold is reached, after which it uses a larger multiplier (default

.
These can be changed from defaults for a particular job using the DS.TOOLS menu, option 6.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518