64Bit Hash file - Bad performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
eoyylo
Participant
Posts: 57
Joined: Mon Jun 30, 2003 6:56 am

64Bit Hash file - Bad performance

Post by eoyylo »

Hi,
i did some test to compare the performance of 32 bit Hash file VS 64 bit Hash File on a HP server.
The test consists of:

a flat file (with 1400595 records),
an hash file (with 1400595 records)
an another hash file with the join of the records (
flat file ---
|
|------> join -----> write hash file
|
hash file ---
)

For the 64 bit and 32 bit i used the same set of record.

I obtained the following values:

for 32 bit:
file type: flat
time elapsed: 10:07
row/sec: 2307

file type: hash
time elapsed: 12:29
row/sec: 1869

file type: hash (join)
time elapsed: 6:48
row/sec: 3432



for 64 bit:
file type: flat
time elapsed: 9:45
row/sec: 2394

file type: hash
time elapsed: 23:27:18
row/sec: 16

file type: hash (join)
time elapsed: 48:37:54
row/sec: 8



The 64 bit hash file are 400 times more slower then 32 bit.

Can anyone explain me the reason?

thanks.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

eoyylo

Exactly how did you create this 64 bit hash file? Hash files performance is based mostly on modulo size. There are many posts explaining this relationship. There is no way this file was sized properly. HASH.HELP and ANALYZE.FILE will tell you if your hash file is sized properly.

Thanks Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
eoyylo
Participant
Posts: 57
Joined: Mon Jun 30, 2003 6:56 am

Post by eoyylo »

Hi Kim,
what are HASH.HELP and ANALYZE.FILE exactly?
Where i can find it?


I created the hash file with the command produced by
HFC.EXE

I calculated the average length of a record and i used "no dicernable pattern" in the key pattern.
The other option was
modulo style= prime number
structure= static hashed
command= mkdbfile
addressing= 64 bit
Average record size= 140


Thanks

Mario

quote:Originally posted by kduke
[br]eoyylo

Exactly how did you create this 64 bit hash file? Hash files performance is based mostly on modulo size. There are many posts explaining this relationship. There is no way this file was sized properly. HASH.HELP and ANALYZE.FILE will tell you if your hash file is sized properly.

Thanks Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

eoyylo

If you telnet into the DataStage server you should get a login prompt. Login with your normal user and password. You should get the TCL prompt ">". This is the database command line prompt. You need to LOGTO project then HASH.HELP filename. This will give you a recommended modulo and separation. RESIZE filename to get a properly sized hash file.

Thanks Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

The easiest way to do hash file configuration is to use the HFC tool in the utility folder on your installation CD. You can do a forum search for HFC to read other threads on this tool.

The hash file help and analyze commands are Universe database commands and documentation is on the IBM web site, version 9.6 manuals can be found at:
http://www-3.ibm.com/software/data/u2/p ... index.html

Have a look at the SQL Administration manual. There are also manuals here for Universe Basic and Universe SQL.

Vincent McBurney
Data Integration Services
www.intramatix.com
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

My first question, why are you using 64 bit hashfiles? You would only need to go to this extreme if your hash file exceeds 2.2 gigabytes. You state 1.4 million rows, but how wide is this hash file?

You pay performance penalties for 64 bit hashfiles. I can't verify your performance metrics, the variables are numerous. Are you reading from and writing back to the same hash file? Did you use read-lock caching by accident? (this fills the internal lock table and progressively degrades job performance until you eventually lock the engine if improperly used - don't use it) Are you using write delay caching? Did you monitor cpu, memory, and disk usage during your benchmark? blah blah blah

Here's what I'd do if I was you. Try using a plain old 32 bit hash file. If you exceed 2.2 gigabytes on the DATA.30 file, then you should question if you need all of the columns within your hash target file. If you could write out to a sequential file the merge set, you should do it! Plus, if you instantiate the job, you'll be able to realize a greater new throughput. If you write back to a hash file, you incur hashing overhead to park the data in the hash file, as well as "tuning". There's no tuning an output sequential file.


Thoughts?
-Ken

Kenneth Bland
Post Reply