write hash file in v8.0

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

write hash file in v8.0

Post by Cr.Cezon »

We have problems with hash files in server 8.0.

DS is installed on Unix with RAID5.

The jobs have problems with performance.

The job begin to work at 3000 row/seg but when the hash reache 260 mb it becomes at 200 row/seg, but after 10 seg it works ok with good performance another time.

there is someone the have this problem?
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Hashed files default to Dynamic (Type 30) files. This means that when growing they need to "split" and this overhead can cause slowdowns. Also, the method used to display rows/second depends upon buffering and is notoriously misleading during runtime.

If you know approximately how many records your hashed file is going to have you can pre-size the file by defining the MINIMUM.MODULUS. If you set this to a high value it will speed up writes, but an empty file with no records will pre-allocate disk space.

I would run your program, then once it has completed go into the ADMIN or command line TCL environment and enter the command "ANALYZE.FILE {yourfile}". The value for "No. of groups (modulus)" is what you can use for the MINIMUM.MODULUS value. This can be set in the hashed file stage create file options.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is not a problem and perfectly normal behaviour. The threshold (one of them) is actually 256MB. These "slowdowns" are caused by a "flush to disk" being triggered, in order to minimize potential data loss. If you are using a write cache, increasing its size will change the point at which this flush is triggered.

You will also see, at the end of the job, that row count ceases to increase but rows/sec decreases. This is the remaining rows being flushed to disk. It is one of many reasons that rows/sec is an almost meaningless metric of "performance". The larger your write cache, the longer DataStage must spend flushing the write cache to disk. The clock (the "seconds" part of rows/sec) keeps running.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

Post by Cr.Cezon »

is this behavior new in version 8.0?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is not new behaviour in version 8.0. At my current contract I see exactly the same behaviour in version 7.5.1.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Cr.Cezon
Participant
Posts: 101
Joined: Mon Mar 05, 2007 4:59 am
Location: Madrid

Post by Cr.Cezon »

thanks a lot
Post Reply