Hash file corruption

vijay · Post by **vijay** » Mon Mar 17, 2003 10:07 am

Hello folks:
Is there any chance that hash file gets corrupted(hope OS ain't the source of corruption)?
What are the best practices to handle a hash file? What is the limitation in the size of the hash file in datastage 6.0?

Smile[:)] forever,
T.Vijay

rasi · Post by **rasi** » Mon Mar 17, 2003 10:52 pm

Hi,

Hash files are basically used for lookup reference to make the fetch fast. To get the maximum performace while creating the file only select the columns which are required. Hash files can get corrupted. But always make sure that you can re-create the hash file with your job re-running.
However the hash files has maximum size is 2 GB by default. To increase a hash table larger you need to create it with the 64bit option.

Thanks
Rasi

ray.wurlod · Post by **ray.wurlod** » Tue Mar 18, 2003 4:30 am

Since hashed files are implemented as operating system files, it is possible for operating system events to trash them. I once saw this occur when a disk "repair" tool decided to truncate the OVER.30 file (for some unreported reason that to this day remains a mystery).

It is also possible, as rasi noted, for a default hashed file to become corrupted by trying to extend it beyond 2GB.

There are a couple of extremely low probability events in the file manager for DataStage that can leave hashed files corrupted if power is lost during a write but, by and large, they are fairly robust. Well-tuned hashed files (with few or zero overflowed groups and, ideally, few or zero oversized records) are the least vulnerable.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

War does not decide who is right, it only decides who is left.
(Bertrand Russell)
I agree, having been in one.
(Ray Wurlod)

vijay · Post by **vijay** » Tue Mar 18, 2003 6:50 am

Hello Folks:

If the hash file gets corrupted(partially or fully), will there be any error whenever we use that particualr hash file in an ETL job or we need to find it out after loading the junk data into the target stage?

BTW, thanks Rasi and Ray for your scintillating(as usual) replies.

Smile forever,
T.Vijay

ray.wurlod · Post by **ray.wurlod** » Tue Mar 18, 2003 8:52 am

The usual error message is "unable to open", which will abort your job fairly quickly - certainly before any rows are processed.
Do you have any evidence for believing that a hashed file has become corrupted, or are you just gathering knowledge?
Sometimes, if an error occurs and a job aborts, there is information in the &PH& directory. This is usually loaded into the job log when the job is reset.

vijay · Post by **vijay** » Tue Mar 18, 2003 11:38 am

Hello Ray:
I am planning to use a bunch of hash files for lookup of very large tables. So, I am just gathering the pros and cons of using them. Because one of our clients had bad experience in using the corrupted hash files.

Smile forever,
T.Vijay

azens · Post by **azens** » Wed Mar 19, 2003 4:43 am

Hi,

In my experience, static hash file is more stable than dynamic one, especially when you read and write the same hash file in a job or in jobs which run concurrently. I always synchronize thr hash file with the table to guarantee the consistancy and stability.

Azens Chang
MetaEdge Corp.

ray.wurlod · Post by **ray.wurlod** » Wed Mar 19, 2003 10:28 pm

And, as I noted earlier on this thread, well-tuned hashed files are the least vulnerable of all.