Slow Writing to Hashed files
Moderators: chulett, rschirm, roy
Sorry, I did not expand on the idea that I had earlier posted about using CRC32. I just assumed you would know about it. Anyways, Kumar has already explained that you would need to compute a CRC32 on the concatenated keys, both on the input stream and the reference and perform the match.
Can you please let us know if it helped? But of course, the 'cache flush' issue pointed out by Kenneth would mean that it may not help you significantly.
gateleys
Can you please let us know if it helped? But of course, the 'cache flush' issue pointed out by Kenneth would mean that it may not help you significantly.
gateleys
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
I wouldn't even consider an approach like this, it's not an appropriate use of CRC32... in my opinion.kumar_s wrote:You are right, CRC32 will give you unique single field value for combination of several fields. So you can reduce the over head of doing the lookup for all the fields. But you need to make sure, while preparing the CRC field, the datatype and length should be same, else you will end up in getting different value and lookup mismatch.
Besides, you've said your problem is with writing to the hashed files, not reading from them. And you've said you only have one key field in the hashed files - not four - and you use @INROWNUM for the key.
Besides the fact that I have no idea how you will manage to use a hashed file that has a row number as a key, your issue as stated (job is not finishing writing quickly to the hashed files) has got nothing to do with your keys. It is the fact that you have Write Cache enabled and you probably haven't sized your hashed files properly.
trobinson wrote:Have you analyzed the hashed files' structure? The Modulo and separation? Does your Create File being checked still create the default dynamic hashed files or have you tuned it?
Did you ever answer these questions?
![Confused :?](./images/smilies/icon_confused.gif)
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 273
- Joined: Wed Oct 18, 2006 12:20 pm
- Location: Porto
this is the command which is creating the Hashed file:trobinson wrote:Have you analyzed the hashed files' structure? The Modulo and separation? Does your Create File being checked still create the default dynamic hashed files or have you tuned it?
Code: Select all
D:\Ascential\DataStage\Engine\bin\mkdbfile E:\Input\LineKeys_RowNum_3 30 1 4 20 50 80 1628
Regarding this, the reason to go for such approach is to reduce the number of fields in a hashed file, coz the hashed file was crossing 2GB limit ( when 8 keys + 8 data flds were written into a single hashed file )... so I stored the 8 keys along with the @INROWNUM in one hashed file and in the remaining Linex_INSTANCE hashed file, the @INROWNUM is key and the rest of the 8 data flds are put there....chulett wrote:Besides the fact that I have no idea how you will manage to use a hashed file that has a row number as a key,
Regarding this, I request some guidance as to how may I size and tune the Hashed file propery....chulett wrote:your issue as stated (job is not finishing writing quickly to the hashed files) has got nothing to do with your keys. It is the fact that you have Write Cache enabled and you probably haven't sized your hashed files properly.
Share to Learn, and Learn to Share.
Why even start a discussion about a composite key? The OP has stated the hashed files all have a single key field.
I would simply concatenate the multiple fields together if I wanted to build a composite key. But again, I wouldn't bother for 4 key fields, that's not going to help with build speed of the hashed file. And I doubt it would substantially affect the lookup speed, either. I recall doing that in older versions, but haven't seen a need to even consider it in the last several years.
I would simply concatenate the multiple fields together if I wanted to build a composite key. But again, I wouldn't bother for 4 key fields, that's not going to help with build speed of the hashed file. And I doubt it would substantially affect the lookup speed, either. I recall doing that in older versions, but haven't seen a need to even consider it in the last several years.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
I might be hijacking the thread now.
But Iam designing one such job with Database lookup. Large volume with 4 keys. For lookup purpose, Iam going to unload only the new created artificial key. I felt its saving space as well as more efficient. Hence going for synthetic key. Would you recommend that.
![Rolling Eyes :roll:](./images/smilies/icon_rolleyes.gif)
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Ah... so now the complete picture of what you are doing is starting to emerge. I don't have time for a good answer here, need to get my butt out the door.
Your Minimum Modulus of 1 (the default) for the hashed files is way too low for your volume. And breaking this one file into two because of the 2GB issue is the hard way to 'fix' the problem, why not simply use a 64BIT hashed file?
I'll leave it to others to pass on the tuning advice.
You also mentioned something about 4 keys, not 8. Please don't tell me you are using 8 to build the hashed file and then 4 for the lookup...
Your Minimum Modulus of 1 (the default) for the hashed files is way too low for your volume. And breaking this one file into two because of the 2GB issue is the hard way to 'fix' the problem, why not simply use a 64BIT hashed file?
I'll leave it to others to pass on the tuning advice.
You also mentioned something about 4 keys, not 8. Please don't tell me you are using 8 to build the hashed file and then 4 for the lookup...
Last edited by chulett on Fri Mar 09, 2007 9:03 am, edited 1 time in total.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Premium Member
- Posts: 273
- Joined: Wed Oct 18, 2006 12:20 pm
- Location: Porto
Here some numbers from the performance statistics:
http://spreadsheets.google.com/pub?key= ... N11Rci_7Mg
Will I be able to use the 64Bit Hashed file on the Windws machine.. I am pretty confused in this regards.. please advice...
If created , will I be able to use the hashed file in lookups as normal hashed file or some other care will be reqd ??
http://spreadsheets.google.com/pub?key= ... N11Rci_7Mg
Will I be able to use the 64Bit Hashed file on the Windws machine.. I am pretty confused in this regards.. please advice...
If created , will I be able to use the hashed file in lookups as normal hashed file or some other care will be reqd ??
Share to Learn, and Learn to Share.
If 'synthetic' means 'generated using CRC32' - don't. Search for posts by mhester mentioning CRC32 - he wrote the utility for Ascential and has very firm opinions on that subject.kumar_s wrote:I might be hijacking the thread now.But Iam designing one such job with Database lookup. Large volume with 4 keys. For lookup purpose, Iam going to unload only the new created artificial key. I felt its saving space as well as more efficient. Hence going for synthetic key. Would you recommend that.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Premium Member
- Posts: 273
- Joined: Wed Oct 18, 2006 12:20 pm
- Location: Porto
.ray.wurlod wrote:Use the Hashed File Calculator to tune the mkdbfile command. This utility is on your DataStage clients CD in the Utilities\Unsupported folder. ...
Dear Ray,
I think this is my weak area.. using HFC...
coz I dont have a clear understanding of how to determine the record size... which is an important parameter for HFC...
![Sad :(](./images/smilies/icon_sad.gif)
Share to Learn, and Learn to Share.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: