Not ideally; VarChars on average are shorter than the maximum length specified. But if you total the lengths, you'll err on the side of caution, and over-size your hashed file. This is far better than under-sizing it.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I have a small doubt...
The data across the 10 instances will not be evenly distributed...
so tuning the Hashed File for some random input number of records,
wont it effect the performance.... i.e in one Instance it will be good and will be poor in the other...
kumar_s wrote:You are right, CRC32 will give you unique single field value for combination of several fields. So you can reduce the over head of doing the lookup for all the fields. But you need to make sure, while preparing the CRC field, the datatype and length should be same, else you will end up in getting different value and lookup mismatch.
Hi Kumar,
Thanks for suggesting the CRC32 generation. I also tried to use the CRC32 to generate a unique identifier for look up purposes. The volume that I am handling is arond 8 million rows. I got distinct CRC 32 values for almost all the records, except for some 6900 rows. Those rows are having totally different fields values, but still ending up in generating the same CRC value. Not sure if I need to change something at the server side to get a unique value for each string that I process. Any help on this would be highly appreciated.
CRC32 is not a suitable approach for your case. CRC32 has 1 in 4Million chance of generating a duplicate. Though 6900 rows for 8 million is extremely high value, its possible.
I would apologize for giving a bad suggestion, if at all you followed the approach from my post.
As widely suggested, you could use Sequence Key generator using datastage macros like @INROWNUM/@OUTROWNUM etc.
Or google and define your own hashing algorithm for more bytes involved.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Even Iam looking out for any better approach followed to generate SK based on one or many existing keys. If at all the whole approach is to create a single integer key based several Char fields, atleast inorder to save space.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
I was working on a similar project and realized that when I was loading the data into a large hashed file, it was taking 10% of CPU. When loading 2 hashed files in paralell, it was taking 20% of CPU and son on...