which hashed lookup is faster? Int or String?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
htrisakti3
Charter Member
Charter Member
Posts: 36
Joined: Thu Jun 10, 2004 11:22 pm

which hashed lookup is faster? Int or String?

Post by htrisakti3 »

For lookup field in the hashed file, Is there any performance difference whether data type Integer is faster than varchar ?

The field is ph# about str(30), and the lookup contains about 7.5million records.

thanks - HT
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

All data in a hashed file is string based. There are no explicit datatypes. It makes no difference if you're using a General hashing algorithm. Since you're using phone numbers, which is string based and can contain delimiters, you've probably not changed the default hashing parameters which require integer values sequentially assigned without gaps.

If you're having lookup performance issues, it's probably because you haven't presized your file or it's too big to be efficient and you'd be better served using a partitioning method and multiple hashed files.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

As Ken said, there are no data types, and therefore no differences based upon data type.

The size of the hashed file is theoretically irrelevant, because the hashing algorithm means that there should be exactly one logical I/O generated for a lookup. No index, no table scan.

In practice, of course, you need a well-tuned hashed file for this optimum situation to occur. Eliminating unnecessary columns from the hashed file, so that average record size is as small as possible, is one of the best things you can do. Eliminating unnecessary records (those that will never be looked up, such as expired dimension records) will also help.

Records larger than the LARGE.RECORD parameter will be particularly painful. Prefer to over-size than to under-size hashed files, so that there is less group overflow.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply