Hash file corruption

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vijay
Participant
Posts: 10
Joined: Tue Apr 15, 2003 10:32 am

Hash file corruption

Post by vijay »

Hello folks:
Is there any chance that hash file gets corrupted(hope OS ain't the source of corruption)?
What are the best practices to handle a hash file? What is the limitation in the size of the hash file in datastage 6.0?

Smile[:)] forever,
T.Vijay
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Hi,

Hash files are basically used for lookup reference to make the fetch fast. To get the maximum performace while creating the file only select the columns which are required. Hash files can get corrupted. But always make sure that you can re-create the hash file with your job re-running.
However the hash files has maximum size is 2 GB by default. To increase a hash table larger you need to create it with the 64bit option.

Thanks
Rasi
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Since hashed files are implemented as operating system files, it is possible for operating system events to trash them. I once saw this occur when a disk "repair" tool decided to truncate the OVER.30 file (for some unreported reason that to this day remains a mystery).

It is also possible, as rasi noted, for a default hashed file to become corrupted by trying to extend it beyond 2GB.

There are a couple of extremely low probability events in the file manager for DataStage that can leave hashed files corrupted if power is lost during a write but, by and large, they are fairly robust. Well-tuned hashed files (with few or zero overflowed groups and, ideally, few or zero oversized records) are the least vulnerable.


Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

War does not decide who is right, it only decides who is left.
(Bertrand Russell)
I agree, having been in one.
(Ray Wurlod)
vijay
Participant
Posts: 10
Joined: Tue Apr 15, 2003 10:32 am

Post by vijay »

Hello Folks:

If the hash file gets corrupted(partially or fully), will there be any error whenever we use that particualr hash file in an ETL job or we need to find it out after loading the junk data into the target stage?

BTW, thanks Rasi and Ray for your scintillating(as usual) replies.

Smile forever,
T.Vijay
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The usual error message is "unable to open", which will abort your job fairly quickly - certainly before any rows are processed.
Do you have any evidence for believing that a hashed file has become corrupted, or are you just gathering knowledge?
Sometimes, if an error occurs and a job aborts, there is information in the &PH& directory. This is usually loaded into the job log when the job is reset.
vijay
Participant
Posts: 10
Joined: Tue Apr 15, 2003 10:32 am

Post by vijay »

Hello Ray:
I am planning to use a bunch of hash files for lookup of very large tables. So, I am just gathering the pros and cons of using them. Because one of our clients had bad experience in using the corrupted hash files.

Smile forever,
T.Vijay
azens
Premium Member
Premium Member
Posts: 24
Joined: Tue Feb 25, 2003 11:59 pm

Post by azens »

Hi,

In my experience, static hash file is more stable than dynamic one, especially when you read and write the same hash file in a job or in jobs which run concurrently. I always synchronize thr hash file with the table to guarantee the consistancy and stability.

Azens Chang
MetaEdge Corp.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

And, as I noted earlier on this thread, well-tuned hashed files are the least vulnerable of all.
Post Reply