Data Transfer rate is too low

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Nripendra Chand
Premium Member
Premium Member
Posts: 196
Joined: Tue Nov 23, 2004 11:50 pm
Location: Sydney (Australia)

Data Transfer rate is too low

Post by Nripendra Chand »

Hi all,

in one my job data is going from sequential file to a hash file. It is simply one to one mapping. I have specified following specifications in Hash stage:
Create Hash File: Checked
Clear before creation: Checked
Rest option are defaulted.
But data transfer b/w these two stages is 300 to 400 rows per seconds which is too low. There are 100000 records in Sequential file.
I'm unable to find the root cause.
I need some suggestions to improve performance in this case.

Regards,
Nripendra Chand
Luk
Participant
Posts: 133
Joined: Thu Dec 02, 2004 8:35 am
Location: Poland
Contact:

Post by Luk »

maybe there are some warnings generating to your log. I noticed that generating many warnings decresing performance.

LUK
Nripendra Chand
Premium Member
Premium Member
Posts: 196
Joined: Tue Nov 23, 2004 11:50 pm
Location: Sydney (Australia)

Post by Nripendra Chand »

No, just 2 warnings are coming. and i have put auto purge log file option for this job. Even then data transfer rate is too low.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

When you create the table use a larger minimum modulus - some prime number like 997 should be better than the default of 1. If performance is very important for this you can go a step further and choose a specific hash algorithm and file type; there are other threads here which describe that in better details than I can do.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

First, what rate do you get if you add a constraint of @FALSE into the Transformer stage? This will prevent any rows from being loaded into the hashed file, but will demonstrate an upper limit on the rate at which DataStage can read a sequential file on your platform.

Second, remove the constraint, and enable write cache in the Hashed File stage. This eliminates the need to write to random locations on disk; all writes are to memory (provided the hashed file is not too large, and at only 1 lakh records it shouldn't be) then the groups (pages) of the hashed file are flushed to disk in sequential order.

Finally, as Arnd suggested, calculate the desired final size of the hashed file and specify that size in the creation options. You can use the Hashed File Calculator on your DataStage CD (unsupported options) to help you to calculate the size, and the tuning parameters. You might also consider using a static hashed file, depending on the pattern of key values in your data.

Contrary to what Arnd suggested, a power of 2 is actually the best choice for MINIMUM.MODULUS with dynamic hashed files. A prime number is generally thought to be best for static hashed files, though it is arguable that a power or multiple of 10 works best for Type 2 (and for dynamic hashed files with SEQ.NUM hashing algorithm). All of that assumes that the record sizes are reasonably similar.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply