64Bit Hash file - Bad performance

eoyylo · Post by **eoyylo** » Mon Jun 30, 2003 2:42 am

Hi,
i did some test to compare the performance of 32 bit Hash file VS 64 bit Hash File on a HP server.
The test consists of:

a flat file (with 1400595 records),
an hash file (with 1400595 records)
an another hash file with the join of the records (
flat file ---
|
|------> join -----> write hash file
|
hash file ---
)

For the 64 bit and 32 bit i used the same set of record.

I obtained the following values:

for 32 bit:
file type: flat
time elapsed: 10:07
row/sec: 2307

file type: hash
time elapsed: 12:29
row/sec: 1869

file type: hash (join)
time elapsed: 6:48
row/sec: 3432

for 64 bit:
file type: flat
time elapsed: 9:45
row/sec: 2394

file type: hash
time elapsed: 23:27:18
row/sec: 16

file type: hash (join)
time elapsed: 48:37:54
row/sec: 8

The 64 bit hash file are 400 times more slower then 32 bit.

Can anyone explain me the reason?

thanks.

kduke · Post by **kduke** » Mon Jun 30, 2003 9:14 am

eoyylo

Exactly how did you create this 64 bit hash file? Hash files performance is based mostly on modulo size. There are many posts explaining this relationship. There is no way this file was sized properly. HASH.HELP and ANALYZE.FILE will tell you if your hash file is sized properly.

Thanks Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

eoyylo · Post by **eoyylo** » Mon Jun 30, 2003 9:47 am

Hi Kim,
what are HASH.HELP and ANALYZE.FILE exactly?
Where i can find it?

I created the hash file with the command produced by
HFC.EXE

I calculated the average length of a record and i used "no dicernable pattern" in the key pattern.
The other option was
modulo style= prime number
structure= static hashed
command= mkdbfile
addressing= 64 bit
Average record size= 140

Thanks

Mario

quote:Originally posted by kduke
[br]eoyylo

Exactly how did you create this 64 bit hash file? Hash files performance is based mostly on modulo size. There are many posts explaining this relationship. There is no way this file was sized properly. HASH.HELP and ANALYZE.FILE will tell you if your hash file is sized properly.

Thanks Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

kduke · Post by **kduke** » Mon Jun 30, 2003 10:27 am

eoyylo

If you telnet into the DataStage server you should get a login prompt. Login with your normal user and password. You should get the TCL prompt ">". This is the database command line prompt. You need to LOGTO project then HASH.HELP filename. This will give you a recommended modulo and separation. RESIZE filename to get a properly sized hash file.

Thanks Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

vmcburney · Post by **vmcburney** » Tue Jul 01, 2003 4:54 pm

The easiest way to do hash file configuration is to use the HFC tool in the utility folder on your installation CD. You can do a forum search for HFC to read other threads on this tool.

The hash file help and analyze commands are Universe database commands and documentation is on the IBM web site, version 9.6 manuals can be found at:
http://www-3.ibm.com/software/data/u2/p ... index.html

Have a look at the SQL Administration manual. There are also manuals here for Universe Basic and Universe SQL.

Vincent McBurney
Data Integration Services
www.intramatix.com

kcbland · Post by **kcbland** » Tue Jul 01, 2003 9:02 pm

My first question, why are you using 64 bit hashfiles? You would only need to go to this extreme if your hash file exceeds 2.2 gigabytes. You state 1.4 million rows, but how wide is this hash file?

You pay performance penalties for 64 bit hashfiles. I can't verify your performance metrics, the variables are numerous. Are you reading from and writing back to the same hash file? Did you use read-lock caching by accident? (this fills the internal lock table and progressively degrades job performance until you eventually lock the engine if improperly used - don't use it) Are you using write delay caching? Did you monitor cpu, memory, and disk usage during your benchmark? blah blah blah

Here's what I'd do if I was you. Try using a plain old 32 bit hash file. If you exceed 2.2 gigabytes on the DATA.30 file, then you should question if you need all of the columns within your hash target file. If you could write out to a sequential file the merge set, you should do it! Plus, if you instantiate the job, you'll be able to realize a greater new throughput. If you write back to a hash file, you incur hashing overhead to park the data in the hash file, as well as "tuning". There's no tuning an output sequential file.

Thoughts?
-Ken

Kenneth Bland