Page 1 of 2

Hash file write failures

Posted: Wed Jan 19, 2005 10:59 am
by chulett
Was curious if others have seen this issue.

Our hash files are rebuilt every cycle. The option to 'Delete' first is check so that they should be getting 'dropped' and recreated each cycle. However, it doesn't seem like that is always what is happening.

This morning, a hash build failed with a write error:

Code: Select all

ds_uvput() - Write failed for record id '110215046260485926'
No null key fields, no space issues, just a failure. Rerun the job and it fails again. For what its worth, this is a very intermittant problem. This particular job has run fine for several weeks before this.

What we have to do to solve the problem is delete the hash file ourselves from the command line (hash and dictionary file) and then everything is fine. These are pathed hashes, btw, not sure if that is related or not.

Posted: Wed Jan 19, 2005 11:02 am
by kcbland
Any caching, any multi-instance jobs all writing to same hash, or is this something really straightforward?

Posted: Wed Jan 19, 2005 11:04 am
by Sainath.Srinivasan
The value you have as the id is the actual key in the hash-file that you have opted. When you drop the hash-file and recreate it, you are rebuilding the dictionary. Hence I assume that the format in which it was initially create does not adhere to the incoming record.

Can you therefore identify this record using the key and identify whether it is in anyway different from your structure.

Posted: Wed Jan 19, 2005 11:09 am
by chulett
Plain vanilla singleton job, OCI -> Transformer -> Hash with no write caching. Only 'odd' thing about this particular one is it is a Type 2 rather than the default dynamic type 30.

Posted: Wed Jan 19, 2005 11:10 am
by kcbland
Sainath.Srinivasan wrote:The value you have as the id is the actual key in the hash-file that you have opted. When you drop the hash-file and recreate it, you are rebuilding the dictionary. Hence I assume that the format in which it was initially create does not adhere to the incoming record.

Can you therefore identify this record using the key and identify whether it is in anyway different from your structure.
This is common misperception. Hash files have no internal integrity. The dictionary is irrelevant. You don't need a dictionary to read and write data to a hash file. You can delete the dictionary file and a hash file will still work.

Posted: Wed Jan 19, 2005 11:14 am
by kcbland
Other than the "something weird" happened explanation, I'd also make sure row buffering is turned off. I've found unexplainable errors using this on certain servers. The same job occasionally fails on one server (dev) but works fine on the prod server. Any chance you're having disk space issues?

Other than that, if its that same row every single time, then of course there's something with that row.

Posted: Wed Jan 19, 2005 11:29 am
by chulett
I'm afraid this is falling into the Something Weird camp. Got plenty of space and row buffering is off by default - and we're using the project defaults. Unless...

There *are* a couple of IPC stages downstream of the transformer that does the lookup on this hash after it is built. Not sure how that would effect the hash build, but... :?

And it doesn't seem to be the same row. The first attempt to simply restart the job cratered it again - same error, different record id. It then ran to completion, using the same data selection, by simply deleting the hash file first.

Posted: Wed Jan 19, 2005 12:26 pm
by kcbland
If you de-couple the building of the hash file from the reference usage later on in the logic, ie. break into 2 jobs, does the hash file error occur?

Posted: Wed Jan 19, 2005 12:50 pm
by chulett
Good question. I can give it a shot, but since the job ran fine for weeks before this it might take some time to know if it fixed anything.

I can do that as a PM, but was hoping someone might have some magic answer for this. Perhaps I'll open a support case with Ascential, let them figure out want might be going on. :wink:

Posted: Thu Jul 21, 2005 12:42 am
by dhiraj
Craig,
I am getting the same errors now. Can you let us know what was the response from Ascential?

Thanks

Dhiraj

Posted: Thu Jul 21, 2005 1:05 am
by ray.wurlod
Write failure can occur for a number of reasons. The obvious one is that you don't have write permission at the operating system level. It may have been created as a UV table to which you have not been granted the requisite SQL privileges.
It can also occur if the hashed file has become corrupted, or its indexes have become corrupted. A quite test is COUNT hashedfilename or SELECT COUNT(*) FROM hashedfilename; - if there is any internal corruption either of these ought to generate an error message.
There is no apparent problem with the key value as reported (the maximum key size is at least 255 characters. However, keys containing field marks, value marks, etc. are prohibited and will also therefore cause write failures.
Other possibilities with less likelihood include violating an SQL constraint on the UV table or having the row rejected by a trigger on the UV table.

Posted: Thu Jul 21, 2005 3:21 am
by ranga1970
A rat jumping between the elephants :x , Dont crush me if I am wrong,
what is the probability this hash file created by user with higher previlages and now an user with lower previlages tried running this job he is not able to overwrite this file because he doen not modification previlages


thanks

Posted: Thu Jul 21, 2005 3:51 am
by ArndW
Ranga,

good approach, but in this case any write would cause the failure, and it seems that only a specific record is failing. My gut feeling is that there is a @FM or @VM in the key field which is causing the failure, but that doesn't explain the sporadicity (yes, that word does not exist in the dictionary). I wonder if the error might change if you used a different file type (15 or whatever matches the key distribution).

Posted: Thu Jul 21, 2005 6:52 am
by chulett
dhiraj wrote:Can you let us know what was the response from Ascential?
Wow... I'd forgotten all about this. I never did open a support case and this error hasn't happened again. :?

For us it had nothing to do with privledges because a single generic userid is (and always has been) used to run all jobs. No UV table issues as it was a Type 2 pathed hash. It seemed obvious that it was corrupted in some fashion and that the corruption was not allowing the 'delete and then recreate' option to truly work as manually removing the file sorted out the issue.

Posted: Thu Jul 21, 2005 8:02 am
by kduke
Craig

We had a memory leak which caused problems like this. It had nothing to do with the hash file. It was running out of process memory. Do a ps command to see if the process is growing.

I would try a type 18 file as well. Maybe type 2 is the problem. Eliminate any possiblity.

Just a couple of ideas.