Page 1 of 2

Rare Error Messsage

Posted: Fri Jan 06, 2006 8:59 am
by I_Server_Whale
Hi All,

When one of my jobs runs it aborts giving a 'Floating Point Exception' message. It also says before it that it is a Phantom.

Has anyone encountered such type of error message before?

Any tiny bit of insight or help is very much appreciated.

Thanks much,
Naveen.

Re: Rare Error Messsage

Posted: Fri Jan 06, 2006 11:58 am
by yaminids
Naveen,

We got a similar error message when our Server was too busy to trigger a job. We did not face the problem after reducing the load on the server

Hope this helps
Yamini

Posted: Fri Jan 06, 2006 12:08 pm
by ArndW
Does this error happen each time? Are you doing any division of numbers in your transform stage?

Posted: Fri Jan 06, 2006 12:54 pm
by chulett
Actually, "too busy" would send a different error, a -14.

Best to post the entire message. More than likely it's as Arnd noted.

Posted: Tue Aug 29, 2006 3:50 am
by TBartfai
We are also facing the same error, when looking up from a hash file in a transformer stage. There are no divisions with numbers, all the data-types are set accordingly. There is only one constraint Not(IsNull()), and thats all in the transformer.
It occurs only with hash files. If pre-load to memory is disabled, works fine. If pre-load to memory is enabled, it fails.
The strange thing is, that those jobs failing now were running successfully for 1 year before.

What I can see, the hash files are not created correctly. It is not converted to data and over files, it creates separate files per record. The file names are the keys, and the content are the other fields. I guess somehow without any indication the hash file creation failes in DS.

There is no useful message in the log, neither in the &PH& directory.
Only these 2 warnings can be seen in the log:

Code: Select all

Attempting to Cleanup after ABORT raised in stage Jobname..Hashname

Code: Select all

   Message: DataStage Job 973 Phantom 26173
Floating point exception
Attempting to Cleanup after ABORT raised in stage 
DataStage Phantom Aborting with @ABORT.CODE = 3
This is from &PH&:

Code: Select all

DataStage Job 974 Phantom 26150
Job Aborted after Fatal Error logged. 

Posted: Tue Aug 29, 2006 5:58 am
by ray.wurlod
Clearly it's not rare. :P

Floating point exceptions ought not to occur with hashed files, though it is possible with corrupted hashed files (where the modulus calculation performs an improper division). Have you checked the structural integrity of the hashed file(s) that your job accesses?

Otherwise, where are you performing any floating point arithmetic?

Posted: Tue Aug 29, 2006 6:22 am
by TBartfai
First thanks for you reply.
Unfortunately it is not rare :(

And it happened this week in 2 different processes with 2 different hash files. I thought that the hash file is responsible as it looks ugly in the file system and disabling memory pre-load eliminates the problem.
But I would like to have the pre-load to be turned on in production.

Could you please explain how can I check the structural integrity of the hash file?

3 jobs are failing on the same hash file. When the hash file is created, it is supposed to be created as Type30(dynamic) and allow stage write cache is turned on.

The funny thing that reading from the hash file as a driver goes ok.
I mean the following job:

(corrupted(?)) hash file ->transformer -> seq file
^
|
another hash file


But this one fails:
oracle->transformer->seq file
^
|
(corrupted(?))hash file


Unfortunately the affected hash file in the unix directory does not look like a normal one. There separate files per each record, where the name is the key of the hash file.


We are not performing any mathematical operations, just simply looking up data from the hash file.

Posted: Tue Aug 29, 2006 7:17 am
by chulett
TBartfai wrote:Unfortunately the affected hash file in the unix directory does not look like a normal one. There separate files per each record, where the name is the key of the hash file.
I forget what silly type this is but it is one of the ways that a dynamic hashed file can 'corrupt' itself, especially if you are deleting and recreating it each time. The loss of the .Type30 file there for any reason can revert it back to this type where every record is a separate file. Best to remove the contents yourself so it can properly rebuild itself.

Posted: Tue Aug 29, 2006 8:38 am
by TBartfai
Deleting the hash file solves the problem, I forgot to mention that I have already tried :oops:

Why I am so interested in this case, because our customer does not accept this workaround, and keep insist on opening a defect on us and prevent this behaviour :(

Do not you know accidentaly how to prevent this corruption?

BTW, you hit the nail on the head, we are always creating almost each hash file again and again every day.

Or moving to static hash file type can eliminate such a behaviour?
That's what I would not like, as this would affect all our processes and jobs :(

Thanks for your reply and I would much appreciate any further help

Posted: Tue Aug 29, 2006 8:44 am
by chulett
TBartfai wrote:BTW, you hit the nail on the head, we are always creating almost each hash file again and again every day.
Stop. :wink:

While the norm is to rebuild the contents of hashed files run to run, there usually isn't an overwhelming need to delete and recreate them each run as well. Why not switch to simply clearing them each time?

Posted: Tue Aug 29, 2006 9:05 am
by TBartfai
:oops:

I meant we have create file option, allow stage write cache and clear before writing enabled,
It is not creating a file again and again.

I have to take a rest before writing silly things. :)

But I am still interested in how to avoid this floating point exception, maybe we will open a case...

Posted: Tue Aug 29, 2006 9:35 am
by ArndW
Craig has already identified the issue - the missing (hidden) .Type30 file. Often this happens when someone copies the two files without noticing the third one. Does your process include some scripts which might be doing some moving around of these hashed files?

Posted: Tue Aug 29, 2006 9:55 am
by netland
Just two things that i can come up with.

missing .T30
and then to many sub-directories, i think it was 32K limit.

Besides not working, it will also make the hashfile very slow :o/


And please help your customer, find a solution, not a workaround... we have seen way to many workaround in DS :evil:

Posted: Tue Aug 29, 2006 10:22 am
by chulett
:? Hmmm... I can't think of anything off the top of my head that I've had to 'work around'. And I believe that 'too many subdirectories' issue is specific to the AIX operating system. Not much you can do there other than live with that limitation.

The OP may be running into resource issues. I'm curious if there has been any attempt to 'tweak' the settings in the uvconfig file for DataStage to help with this? For example, flirting with the edge of this parameter could cause the issue you are seeing, I do believe:

Code: Select all

# T30FILE - specifies the number of
#       dynamic files that may be opened.
#       Used to allocate shared memory
#       concurrency control headers.
T30FILE 200
Are your values 'out of the box'?

Posted: Tue Aug 29, 2006 10:35 am
by TBartfai
Again thanks for all your answers :)

We are not doing any manual or own developed hash file manipulation, we are using only built-in stage properties.

But I have checked the uvconfig files on our test, production-like and production servers and its value is either 2000 or 2050.

I do not exactly understand why was it set :(
Maybe because we are using many hash files for lookups instead of DB lookups. There are also hash files around 3 GB :roll: .
I know it is not recommended :), we are currently working on this issue to get rid of.