Rare Error Messsage

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Rare Error Messsage

Post by I_Server_Whale »

Hi All,

When one of my jobs runs it aborts giving a 'Floating Point Exception' message. It also says before it that it is a Phantom.

Has anyone encountered such type of error message before?

Any tiny bit of insight or help is very much appreciated.

Thanks much,
Naveen.
yaminids
Premium Member
Premium Member
Posts: 387
Joined: Mon Oct 18, 2004 1:04 pm

Re: Rare Error Messsage

Post by yaminids »

Naveen,

We got a similar error message when our Server was too busy to trigger a job. We did not face the problem after reducing the load on the server

Hope this helps
Yamini
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Does this error happen each time? Are you doing any division of numbers in your transform stage?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Actually, "too busy" would send a different error, a -14.

Best to post the entire message. More than likely it's as Arnd noted.
-craig

"You can never have too many knives" -- Logan Nine Fingers
TBartfai
Premium Member
Premium Member
Posts: 15
Joined: Wed Jul 28, 2004 5:26 am

Post by TBartfai »

We are also facing the same error, when looking up from a hash file in a transformer stage. There are no divisions with numbers, all the data-types are set accordingly. There is only one constraint Not(IsNull()), and thats all in the transformer.
It occurs only with hash files. If pre-load to memory is disabled, works fine. If pre-load to memory is enabled, it fails.
The strange thing is, that those jobs failing now were running successfully for 1 year before.

What I can see, the hash files are not created correctly. It is not converted to data and over files, it creates separate files per record. The file names are the keys, and the content are the other fields. I guess somehow without any indication the hash file creation failes in DS.

There is no useful message in the log, neither in the &PH& directory.
Only these 2 warnings can be seen in the log:

Code: Select all

Attempting to Cleanup after ABORT raised in stage Jobname..Hashname

Code: Select all

   Message: DataStage Job 973 Phantom 26173
Floating point exception
Attempting to Cleanup after ABORT raised in stage 
DataStage Phantom Aborting with @ABORT.CODE = 3
This is from &PH&:

Code: Select all

DataStage Job 974 Phantom 26150
Job Aborted after Fatal Error logged. 
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Clearly it's not rare. :P

Floating point exceptions ought not to occur with hashed files, though it is possible with corrupted hashed files (where the modulus calculation performs an improper division). Have you checked the structural integrity of the hashed file(s) that your job accesses?

Otherwise, where are you performing any floating point arithmetic?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
TBartfai
Premium Member
Premium Member
Posts: 15
Joined: Wed Jul 28, 2004 5:26 am

Post by TBartfai »

First thanks for you reply.
Unfortunately it is not rare :(

And it happened this week in 2 different processes with 2 different hash files. I thought that the hash file is responsible as it looks ugly in the file system and disabling memory pre-load eliminates the problem.
But I would like to have the pre-load to be turned on in production.

Could you please explain how can I check the structural integrity of the hash file?

3 jobs are failing on the same hash file. When the hash file is created, it is supposed to be created as Type30(dynamic) and allow stage write cache is turned on.

The funny thing that reading from the hash file as a driver goes ok.
I mean the following job:

(corrupted(?)) hash file ->transformer -> seq file
^
|
another hash file


But this one fails:
oracle->transformer->seq file
^
|
(corrupted(?))hash file


Unfortunately the affected hash file in the unix directory does not look like a normal one. There separate files per each record, where the name is the key of the hash file.


We are not performing any mathematical operations, just simply looking up data from the hash file.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

TBartfai wrote:Unfortunately the affected hash file in the unix directory does not look like a normal one. There separate files per each record, where the name is the key of the hash file.
I forget what silly type this is but it is one of the ways that a dynamic hashed file can 'corrupt' itself, especially if you are deleting and recreating it each time. The loss of the .Type30 file there for any reason can revert it back to this type where every record is a separate file. Best to remove the contents yourself so it can properly rebuild itself.
-craig

"You can never have too many knives" -- Logan Nine Fingers
TBartfai
Premium Member
Premium Member
Posts: 15
Joined: Wed Jul 28, 2004 5:26 am

Post by TBartfai »

Deleting the hash file solves the problem, I forgot to mention that I have already tried :oops:

Why I am so interested in this case, because our customer does not accept this workaround, and keep insist on opening a defect on us and prevent this behaviour :(

Do not you know accidentaly how to prevent this corruption?

BTW, you hit the nail on the head, we are always creating almost each hash file again and again every day.

Or moving to static hash file type can eliminate such a behaviour?
That's what I would not like, as this would affect all our processes and jobs :(

Thanks for your reply and I would much appreciate any further help
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

TBartfai wrote:BTW, you hit the nail on the head, we are always creating almost each hash file again and again every day.
Stop. :wink:

While the norm is to rebuild the contents of hashed files run to run, there usually isn't an overwhelming need to delete and recreate them each run as well. Why not switch to simply clearing them each time?
-craig

"You can never have too many knives" -- Logan Nine Fingers
TBartfai
Premium Member
Premium Member
Posts: 15
Joined: Wed Jul 28, 2004 5:26 am

Post by TBartfai »

:oops:

I meant we have create file option, allow stage write cache and clear before writing enabled,
It is not creating a file again and again.

I have to take a rest before writing silly things. :)

But I am still interested in how to avoid this floating point exception, maybe we will open a case...
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Craig has already identified the issue - the missing (hidden) .Type30 file. Often this happens when someone copies the two files without noticing the third one. Does your process include some scripts which might be doing some moving around of these hashed files?
netland
Participant
Posts: 12
Joined: Tue Apr 08, 2003 11:43 pm

Post by netland »

Just two things that i can come up with.

missing .T30
and then to many sub-directories, i think it was 32K limit.

Besides not working, it will also make the hashfile very slow :o/


And please help your customer, find a solution, not a workaround... we have seen way to many workaround in DS :evil:
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:? Hmmm... I can't think of anything off the top of my head that I've had to 'work around'. And I believe that 'too many subdirectories' issue is specific to the AIX operating system. Not much you can do there other than live with that limitation.

The OP may be running into resource issues. I'm curious if there has been any attempt to 'tweak' the settings in the uvconfig file for DataStage to help with this? For example, flirting with the edge of this parameter could cause the issue you are seeing, I do believe:

Code: Select all

# T30FILE - specifies the number of
#       dynamic files that may be opened.
#       Used to allocate shared memory
#       concurrency control headers.
T30FILE 200
Are your values 'out of the box'?
-craig

"You can never have too many knives" -- Logan Nine Fingers
TBartfai
Premium Member
Premium Member
Posts: 15
Joined: Wed Jul 28, 2004 5:26 am

Post by TBartfai »

Again thanks for all your answers :)

We are not doing any manual or own developed hash file manipulation, we are using only built-in stage properties.

But I have checked the uvconfig files on our test, production-like and production servers and its value is either 2000 or 2050.

I do not exactly understand why was it set :(
Maybe because we are using many hash files for lookups instead of DB lookups. There are also hash files around 3 GB :roll: .
I know it is not recommended :), we are currently working on this issue to get rid of.
Post Reply