Hash File Question

patonp · Post by **patonp** » Thu Dec 29, 2005 11:14 am

I've come across a job that selects data from a database table and populates a hash file. In particular, one source column has a datatype of char(40) in the source system. However, in DataStage the column has been defined incorrectly as char(10) and is passed through as such to the hash file where the column is also defined as char(10). (It's the last column in the hash file.)

I realize that this discrepancy is incorrect and must be fixed, however... when I view the hash file, I can see all 40 characters contained in the source data, not just the first ten. Will this difference between the defined maximum column width and actual values cause the lookup results be incorrect, or is the tool fairly forgiving in these types of cases?

Thanks!

Peter

chulett · Post by **chulett** » Thu Dec 29, 2005 11:20 am

Very forgiving, actually.

patonp · Post by **patonp** » Thu Dec 29, 2005 11:33 am

From the perspective of the processing logic and end results, is there any downside to the way it's defined? (I'm trying to determine how urgently a fix needs to be implemented...)

ArndW · Post by **ArndW** » Thu Dec 29, 2005 11:37 am

DataStage hashed file don't have datatype based limitations. Everything in the record is stored as one long noncontinguous string. The data types and lengths you specify in the metadata are mainly for use in other stages. So there is no reason to modify your Hashed file metadata to reflect accurate string lengths.

I will be important to note that a CHAR(10) column from a hashed file that contains up to 40 characters will cause problems when writing to an Oracle CHAR(10) column since DS won't automatically or implicitly truncate the data.

patonp · Post by **patonp** » Thu Dec 29, 2005 11:51 am

Sorry to keep this thread going, but now you've got me interested! If each row of the hash file is stored as one contiguous string, then how are the key fields identified and internally stored?

ArndW · Post by **ArndW** » Thu Dec 29, 2005 12:12 pm

The key is stored separately from the data. A hashed file only has one unique key string. Multiple or compound keys as used in DS jobs are actually stored as one string with a specific separator.

patonp · Post by **patonp** » Thu Dec 29, 2005 12:22 pm

Thanks for the responses!

Cheers,

Peter

kduke · Post by **kduke** » Thu Dec 29, 2005 4:59 pm

Actually the key is a part of the string. Each level has a separate separator starting with char(255) which cannot be used for anything. This separates keys from the record also known as @FM. Char(254) is called a field mark. Char(253) is called a value mark also @VM. A value mark will separate multiple values or an array at the column level. Char(252) is called a subvalue mark. Char(251) is called a text mark and is @TM. @TM is used to separate multiple part keys.

patonp · Post by **patonp** » Thu Dec 29, 2005 5:03 pm

So...is all metadata (i.e. data type, length etc.) except for the key column indicator ignored so far as the internal processing and data storage are concerned?

ray.wurlod · Post by **ray.wurlod** » Thu Dec 29, 2005 8:02 pm

Yes if accessed natively, no if accessed via SQL which does enforce data types, etc. If the hashed file is created as a UV table, however, then data types and security and integrity constraints are honoured at all times.

The only exception is where you have created triggers - these can not be bypassed at all. But this is not something currently practised in DataStage - mainly because it's not required.