Unable to open file: file corruption detected

eoyylo · Post by **eoyylo** » Wed Apr 30, 2003 1:29 am

Hi,
i use DataStage in a Sun server.
The data load uses several jobs with this steps:
1) read the data from a remote db and creation of an hash file. During this step the job reads 27.000.000 of record from a remote dband all work fine.
2)The next job should read that hash file but Datastage aborts with this error description:

DataStage Job 594 Phantom 3946
Program "DSD.UVOpen": Line 335, WARNING: Internal file corruption detected during file open!
File must be repaired, possible truncation.
hsize: 2048
bsize: 2048
fsize: 2147483647
Job Aborted after Fatal Error logged.
Program "DSD.WriteLog": Line 192, Abort.
Attempting to Cleanup after ABORT raised in stage H3GITmisSTGtoDDSITADMMTRAFFOLOItc3..T_ITC
DataStage Phantom Aborting with @ABORT.CODE = 1

Can anyone help me to resolve this problem?

Does a size limit exist to the dimension of an hash file?

Is it possible to calculate the HDD space for an hash file according to number and dimension of the record?

Thanks in advance

ray.wurlod · Post by **ray.wurlod** » Wed Apr 30, 2003 2:13 am

Yes, you have hit the 2GB size limit for a default hashed file. This limit is caused by internal pointers being implemented as 32-bit. You can specify 64-bit pointers when creating a hashed file but, unfortunately, the Options dialog on the hashed file stage does not provide for it.
Use the Hashed File Calculator (in the UtilitiesUnsupported folder on your DataStage CD) to generate the appropriate command (saves you having to know or do the calculations), then copy this command. Use CREATE.FILE to create the hashed file in an account ( = project directory) or mkdbfile to create it in a directory (provide the entire pathname rather than just the filename in this case).
Separation is the size of each group in units of 512 bytes; for a dynamic hashed file you can only use 4 (2KB groups) or 8 (4KB groups).
Multiply the (number of groups + 1) by the group size to get the minimum HDD size of the hashed file, which assumes perfect tuning. Add approximately 50% to take into account likely imperfact tuning.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

ariear · Post by **ariear** » Wed Apr 30, 2003 10:32 am

There's an option in the uvconfig file called 64BIT_FILES which can be turned on and it implies that the default creation of new files will be with 64bit pointers.
I didn't use it so I'm not sure it works.

ray.wurlod · Post by **ray.wurlod** » Wed Apr 30, 2003 3:25 pm

NEVER set 64BIT_FILES on.
It does work, which is why I give this advice.
What you get is EVERY hashed file, including the small ones and the ones that DataStage uses internally (that is, the Repository), being capable of exceeding 2GB.
Note also that some platforms, particularly Windows, don't support 64-bit addressing. On these platforms you don't have a choice about 32-bit or 64-bit; to go beyond 2GB you need to implement Distributed hashed files (which don't do the memory cache thing).

ariear · Post by **ariear** » Wed Apr 30, 2003 4:26 pm

Interesting !
Does hash files smaller than 2GB but with 64bit pointer behave unexpectedly ?

I wish I could cash a 2GB hash file but the tunables entry in Administrator can go up to 999MB only

ray.wurlod · Post by **ray.wurlod** » Thu May 01, 2003 3:10 pm

No, they work fine, but it's a waste of space to use 64-bit pointers unnecessarily.
As far as the maximum cache size is concerned, you can set this higher but not through the client. This is one you need to take up with Ascential!
Read dskcache.pdf in the documentation set, particularly the cache size options for the CREATE.FILE command.

ariear · Post by **ariear** » Thu May 01, 2003 3:41 pm

Great ! I'll look it up[:)]

hughsm · Post by **hughsm** » Sat May 03, 2003 11:36 pm

This is great stuff! I deal with a lot of DB2 stuff with records in the 10's of millions. When should I use a hash file and when should I just do a join with a constraint on the tables?

ray.wurlod · Post by **ray.wurlod** » Sun May 04, 2003 4:56 am

Typically a hashed file will have only two or three columns, so your requirement for 10 million rows should easily be able to be accommodated in a 2GB hashed file. If not, you can create a larger hashed file, but must explicitly state that your require 64-bit addressing to be used, for example:
CREATE.FILE MYFILE DYNAMIC MINIMUM.MODULUS 4000000 64BIT

Or you can use Distributed files, but you lose all the memory caching benefit of hashed files by doing so.

Yes you can perform joins within DB2 at extraction time and, if this leads to fewer rows being processed through DataStage, it is likely to provide lessened throughput times.

The big no-no is reference inputs (lookups) to a remote database server; this is inefficient in its use of network packets and slows the DataStage job while it awaits the result of each lookup.

It's almost time for a new thread - this one is wandering somewhat away from its original title! [V]

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518