which hashed lookup is faster? Int or String?

htrisakti3 · Post by **htrisakti3** » Sun Apr 15, 2007 6:22 pm

For lookup field in the hashed file, Is there any performance difference whether data type Integer is faster than varchar ?

The field is ph# about str(30), and the lookup contains about 7.5million records.

thanks - HT

kcbland · Post by **kcbland** » Sun Apr 15, 2007 7:52 pm

All data in a hashed file is string based. There are no explicit datatypes. It makes no difference if you're using a General hashing algorithm. Since you're using phone numbers, which is string based and can contain delimiters, you've probably not changed the default hashing parameters which require integer values sequentially assigned without gaps.

If you're having lookup performance issues, it's probably because you haven't presized your file or it's too big to be efficient and you'd be better served using a partitioning method and multiple hashed files.

ray.wurlod · Post by **ray.wurlod** » Sun Apr 15, 2007 9:26 pm

As Ken said, there are no data types, and therefore no differences based upon data type.

The size of the hashed file is theoretically irrelevant, because the hashing algorithm means that there should be exactly one logical I/O generated for a lookup. No index, no table scan.

In practice, of course, you need a well-tuned hashed file for this optimum situation to occur. Eliminating unnecessary columns from the hashed file, so that average record size is as small as possible, is one of the best things you can do. Eliminating unnecessary records (those that will never be looked up, such as expired dimension records) will also help.

Records larger than the LARGE.RECORD parameter will be particularly painful. Prefer to over-size than to under-size hashed files, so that there is less group overflow.