Poor performance from hashed file following RedHat upgrade

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
PaulS
Premium Member
Premium Member
Posts: 45
Joined: Fri Nov 05, 2010 4:38 am

Poor performance from hashed file following RedHat upgrade

Post by PaulS »

Hi,

We have recently upgraded the o/s from redhat 5.5 to 5.9,.. one job in particular has dramatically slowed down. Lots of stages write to one hashed file, i'm getting poor performance from this hashed file..

It was being created as dynamic, so I've used the HFC to get the settings for a static file,.. which was;
CREATE.FILE HashFileName 5 589163 1 32BIT - unfortunately that was worse.

Here's the stats before testing, so when it was a dynamic file

Code: Select all

ANALYZE.FILE HashFileName STATS
name ....................... HashFileName
Pathname ................... HashFileName
File Type .................. DYNAMIC
NLS Character Set Mapping .. NONE
Hashing Algorithm .......... GENERAL
No. of groups (modulus) .... 402960 current ( minimum 1, 3 empty,
                                          110195 overflowed, 5151 badly )
Number of records .......... 7370417
Large record size .......... 1628 bytes
Number of large records .... 0
Group size ................. 2048 bytes
Load factors ............... 90% (split), 50% (merge) and 80% (actual)
Total size ................. 1061656576 bytes
Total size of record data .. 178568525 bytes
Total size of record IDs ... 489893515 bytes
Unused space ............... 393190440 bytes
Total space for records .... 1061652480 bytes
File name .................. HashFileName
                             Number per group ( total of 402960 groups )
                             Average    Minimum    Maximum     StdDev
Group buffers ..............    1.28          1          3       0.45
Records ....................   18.29          1         60       8.31
Large records ..............    0.00          0          0       0.00          
Data bytes .................  443.14         18       1469     202.04
Record ID bytes ............ 1215.74         53       3939     555.54
Unused bytes ...............  953.41         12       2116     478.20
Total bytes ................ 2612.29       2048       6144       0.00

                             Number per record ( total of 7370417 records )
                             Average    Minimum    Maximum     StdDev
Data bytes .................   24.23         21         33       2.07
Record ID bytes ............   66.47         47         87       8.38    
File name .................. HashFileName
                         Histogram of records and ID lengths

                                                                      100.0%
    Bytes ------------------------------------------------------------------

  up to 4|
  up to 8|
 up to 16|
 up to 32|  
 up to 64|
up to 128| >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
up to 256|
up to 512|
 up to 1k|
 up to 2k|
 up to 4k|
 up to 8k|
up to 16k|
     More|
          ------------------------------------------------------------------
The hash file looks like this...

Code: Select all

HashFileName
Column name  Key  SQL Type       Length   Scale  Nullable
FIELD1         Y     Varchar          6                No
FIELD2         Y     Varchar         11               Yes
FIELD3         Y     Decimal         38               Yes
FIELD4         Y     Varchar         64               Yes
FIELD5         Y     Varchar         10               Yes
FIELD6         Y     Varchar         15               Yes
FIELD7         Y     Varchar         50               Yes
FIELD8         Y     Varchar         12               Yes
FIELD9         Y     Varchar          3                No
FIELD10        N     Decimal         38       2       Yes
FIELD11        N     Decimal         38               Yes
FIELD12        N     Decimal         38       2       Yes   
Any help on this very much appreciated.

Thanks in advance

Paul
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Your keys are large and 1/4 of the records are overflowed. Do you know which of your key components changes the most? That would help in choosing a static hashing algorithm (Type "5" uses just the last 4 bytes of FIELD9 to hash the records into groups).

You could create the file as TYPE 30 but specify a MINIMUM.MODULO bigger than your current value but preferably a prime number, i.e. 500009.

Also, is this file in the Project directory with the other hashed files or perhaps on a different (and perhaps slower) mount point?
PaulS
Premium Member
Premium Member
Posts: 45
Joined: Fri Nov 05, 2010 4:38 am

Post by PaulS »

Thanks ArndW - I've taken you advice and created as dynamic 500009..
CREATE.FILE FILENAME DYNAMIC MINIMUM.MODULUS 500009 32BIT

The file is in the project directory, but the vpar uses the same area/discs on the san. I'm not going to try moving it just yet.

I've also enabled write caching on each on the stages which write to the hash file. I'm rerunning tests now,... so far so good!

Thanks again
Paul
Post Reply