Hash File Question

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
patonp
Premium Member
Premium Member
Posts: 110
Joined: Thu Mar 11, 2004 7:59 am
Location: Toronto, ON

Hash File Question

Post by patonp »

I've come across a job that selects data from a database table and populates a hash file. In particular, one source column has a datatype of char(40) in the source system. However, in DataStage the column has been defined incorrectly as char(10) and is passed through as such to the hash file where the column is also defined as char(10). (It's the last column in the hash file.)

I realize that this discrepancy is incorrect and must be fixed, however... when I view the hash file, I can see all 40 characters contained in the source data, not just the first ten. Will this difference between the defined maximum column width and actual values cause the lookup results be incorrect, or is the tool fairly forgiving in these types of cases?

Thanks!

Peter
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Very forgiving, actually. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
patonp
Premium Member
Premium Member
Posts: 110
Joined: Thu Mar 11, 2004 7:59 am
Location: Toronto, ON

Post by patonp »

From the perspective of the processing logic and end results, is there any downside to the way it's defined? (I'm trying to determine how urgently a fix needs to be implemented...)
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

DataStage hashed file don't have datatype based limitations. Everything in the record is stored as one long noncontinguous string. The data types and lengths you specify in the metadata are mainly for use in other stages. So there is no reason to modify your Hashed file metadata to reflect accurate string lengths.

I will be important to note that a CHAR(10) column from a hashed file that contains up to 40 characters will cause problems when writing to an Oracle CHAR(10) column since DS won't automatically or implicitly truncate the data.
patonp
Premium Member
Premium Member
Posts: 110
Joined: Thu Mar 11, 2004 7:59 am
Location: Toronto, ON

Post by patonp »

Sorry to keep this thread going, but now you've got me interested! If each row of the hash file is stored as one contiguous string, then how are the key fields identified and internally stored?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The key is stored separately from the data. A hashed file only has one unique key string. Multiple or compound keys as used in DS jobs are actually stored as one string with a specific separator.
patonp
Premium Member
Premium Member
Posts: 110
Joined: Thu Mar 11, 2004 7:59 am
Location: Toronto, ON

Post by patonp »

Thanks for the responses!

Cheers,

Peter
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Actually the key is a part of the string. Each level has a separate separator starting with char(255) which cannot be used for anything. This separates keys from the record also known as @FM. Char(254) is called a field mark. Char(253) is called a value mark also @VM. A value mark will separate multiple values or an array at the column level. Char(252) is called a subvalue mark. Char(251) is called a text mark and is @TM. @TM is used to separate multiple part keys.
Mamu Kim
patonp
Premium Member
Premium Member
Posts: 110
Joined: Thu Mar 11, 2004 7:59 am
Location: Toronto, ON

Post by patonp »

So...is all metadata (i.e. data type, length etc.) except for the key column indicator ignored so far as the internal processing and data storage are concerned?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes if accessed natively, no if accessed via SQL which does enforce data types, etc. If the hashed file is created as a UV table, however, then data types and security and integrity constraints are honoured at all times.

The only exception is where you have created triggers - these can not be bypassed at all. But this is not something currently practised in DataStage - mainly because it's not required.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply