HFC hashed file calculations
Posted: Thu Dec 06, 2012 3:19 am
I am performing some optimisation on some jobs and noticed that by far the best performance comes from creating the HF (type 2) before running the job .
Using the parameters generated in the HFC.exe, in development I know how many lines will be in each hashed file. When this runs in production I won't know, so I have to "guess" what the best settings for the hashed files will be or make the files super big to accomodate any errors.
For example an average row size of 40 for 8.4m lines gives me
2 229693 1 32BIT
But when I run in production, this could be more or less.
Question
What is the relationship between the av size (40) and number of rows (8.4m) and the value 229693. Is there anyway I can write a routine to calculate this.
Thanks
Using the parameters generated in the HFC.exe, in development I know how many lines will be in each hashed file. When this runs in production I won't know, so I have to "guess" what the best settings for the hashed files will be or make the files super big to accomodate any errors.
For example an average row size of 40 for 8.4m lines gives me
2 229693 1 32BIT
But when I run in production, this could be more or less.
Question
What is the relationship between the av size (40) and number of rows (8.4m) and the value 229693. Is there anyway I can write a routine to calculate this.
Thanks