I am performing some optimisation on some jobs and noticed that by far the best performance comes from creating the HF (type 2) before running the job .
Using the parameters generated in the HFC.exe, in development I know how many lines will be in each hashed file. When this runs in production I won't know, so I have to "guess" what the best settings for the hashed files will be or make the files super big to accomodate any errors.
For example an average row size of 40 for 8.4m lines gives me
2 229693 1 32BIT
But when I run in production, this could be more or less.
Question
What is the relationship between the av size (40) and number of rows (8.4m) and the value 229693. Is there anyway I can write a routine to calculate this.
Thanks
HFC hashed file calculations
Moderators: chulett, rschirm, roy
HFC hashed file calculations
Colin Larcombe
-------------------
Certified IBM Infosphere Datastage Developer
-------------------
Certified IBM Infosphere Datastage Developer
Hi,
Generally the most significant , performance wise, impact is the number of groups built as you create the file.
The number og groups required depends on the number of records your about to process divided by the number of rows that fits in one group (the hashed file is built from) the size of each hashed file group can be 2k or 4k (group size 1 or 2)
So having done that calculation you can set the group count properly.
In my experiance, once using a disk storage machine instead of local disks, there is no real benefit to using statis hashed files over dynamic ones, but maybe others have a different experiance.
you can get an estimated starting point if there is a real working process
or get an estimate that you will monitor and change if needs be.
IHTH (I Hope This Helps),
Generally the most significant , performance wise, impact is the number of groups built as you create the file.
The number og groups required depends on the number of records your about to process divided by the number of rows that fits in one group (the hashed file is built from) the size of each hashed file group can be 2k or 4k (group size 1 or 2)
So having done that calculation you can set the group count properly.
In my experiance, once using a disk storage machine instead of local disks, there is no real benefit to using statis hashed files over dynamic ones, but maybe others have a different experiance.
you can get an estimated starting point if there is a real working process
or get an estimate that you will monitor and change if needs be.
IHTH (I Hope This Helps),
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Hi Roy,
I am trying to understand what you mean by groups.
The three fields returned by the HFC are
FileType
Modulo
Separation
Is the separation what you mean by a group ?
As the separation and file type will remain static, what I need to (roughly) calculate is the modulo
How can I achieve this? What is it a function of ?
As for the disk storage, we are not that advanced here (yet!), we still used local Windows disks. If you have any recommendations for alternative disk storage I am all ears.
Using static HFs against dynamic, I am doubling the throughput time.
Thanks
I am trying to understand what you mean by groups.
The three fields returned by the HFC are
FileType
Modulo
Separation
Is the separation what you mean by a group ?
As the separation and file type will remain static, what I need to (roughly) calculate is the modulo
How can I achieve this? What is it a function of ?
As for the disk storage, we are not that advanced here (yet!), we still used local Windows disks. If you have any recommendations for alternative disk storage I am all ears.
Using static HFs against dynamic, I am doubling the throughput time.
Thanks
Colin Larcombe
-------------------
Certified IBM Infosphere Datastage Developer
-------------------
Certified IBM Infosphere Datastage Developer
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: