Slow Writing to Hashed files

gateleys · Post by **gateleys** » Fri Mar 09, 2007 8:15 am

Sorry, I did not expand on the idea that I had earlier posted about using CRC32. I just assumed you would know about it. Anyways, Kumar has already explained that you would need to compute a CRC32 on the concatenated keys, both on the input stream and the reference and perform the match.

Can you please let us know if it helped? But of course, the 'cache flush' issue pointed out by Kenneth would mean that it may not help you significantly.

gateleys

ray.wurlod · Post by **ray.wurlod** » Fri Mar 09, 2007 8:16 am

With write cache disabled you will (I predict) see a slower load rate (than with write cache enabled) that doesn't decline suddenly at the end.

chulett · Post by **chulett** » Fri Mar 09, 2007 8:22 am

kumar_s wrote:You are right, CRC32 will give you unique single field value for combination of several fields. So you can reduce the over head of doing the lookup for all the fields. But you need to make sure, while preparing the CRC field, the datatype and length should be same, else you will end up in getting different value and lookup mismatch.

I wouldn't even consider an approach like this, it's not an appropriate use of CRC32... in my opinion.

Besides, you've said your problem is with writing to the hashed files, not reading from them. And you've said you only have one key field in the hashed files - not four - and you use @INROWNUM for the key.

Besides the fact that I have no idea how you will manage to use a hashed file that has a row number as a key, your issue as stated (job is not finishing writing quickly to the hashed files) has got nothing to do with your keys. It is the fact that you have Write Cache enabled and you probably haven't sized your hashed files properly.

trobinson wrote:Have you analyzed the hashed files' structure? The Modulo and separation? Does your Create File being checked still create the default dynamic hashed files or have you tuned it?

Did you ever answer these questions?

kumar_s · Post by **kumar_s** » Fri Mar 09, 2007 8:29 am

Craig, though its not the right use of CRC32, wont it be efficient to follow that approach. Do you suspect the chance of repeatable value would cause some error.

asitagrawal · Post by **asitagrawal** » Fri Mar 09, 2007 8:34 am

trobinson wrote:Have you analyzed the hashed files' structure? The Modulo and separation? Does your Create File being checked still create the default dynamic hashed files or have you tuned it?

this is the command which is creating the Hashed file:

Code: Select all

D:\Ascential\DataStage\Engine\bin\mkdbfile E:\Input\LineKeys_RowNum_3 30 1 4 20 50 80 1628

I have not tuned the dommand for creating the Hashed file.... Plz suggest as to how may I attempt to tune it.

chulett wrote:Besides the fact that I have no idea how you will manage to use a hashed file that has a row number as a key,

Regarding this, the reason to go for such approach is to reduce the number of fields in a hashed file, coz the hashed file was crossing 2GB limit ( when 8 keys + 8 data flds were written into a single hashed file )... so I stored the 8 keys along with the @INROWNUM in one hashed file and in the remaining Linex_INSTANCE hashed file, the @INROWNUM is key and the rest of the 8 data flds are put there....

chulett wrote:your issue as stated (job is not finishing writing quickly to the hashed files) has got nothing to do with your keys. It is the fact that you have Write Cache enabled and you probably haven't sized your hashed files properly.

Regarding this, I request some guidance as to how may I size and tune the Hashed file propery....

chulett · Post by **chulett** » Fri Mar 09, 2007 8:35 am

Why even start a discussion about a composite key? The OP has stated the hashed files all have a single key field.

I would simply concatenate the multiple fields together if I wanted to build a composite key. But again, I wouldn't bother for 4 key fields, that's not going to help with build speed of the hashed file. And I doubt it would substantially affect the lookup speed, either. I recall doing that in older versions, but haven't seen a need to even consider it in the last several years.

kumar_s · Post by **kumar_s** » Fri Mar 09, 2007 8:44 am

I might be hijacking the thread now.

But Iam designing one such job with Database lookup. Large volume with 4 keys. For lookup purpose, Iam going to unload only the new created artificial key. I felt its saving space as well as more efficient. Hence going for synthetic key. Would you recommend that.

kcbland · Post by **kcbland** » Fri Mar 09, 2007 8:45 am

Remove the hashed files from disk, set the minimum modulus, turn off write-delay cache. Now run your job.

chulett · Post by **chulett** » Fri Mar 09, 2007 8:47 am

Ah... so now the complete picture of what you are doing is starting to emerge. I don't have time for a good answer here, need to get my butt out the door.

Your Minimum Modulus of 1 (the default) for the hashed files is way too low for your volume. And breaking this one file into two because of the 2GB issue is the hard way to 'fix' the problem, why not simply use a 64BIT hashed file?

I'll leave it to others to pass on the tuning advice.

You also mentioned something about 4 keys, not 8. Please don't tell me you are using 8 to build the hashed file and then 4 for the lookup...

asitagrawal · Post by **asitagrawal** » Fri Mar 09, 2007 8:59 am

Here some numbers from the performance statistics:

http://spreadsheets.google.com/pub?key= ... N11Rci_7Mg

Will I be able to use the 64Bit Hashed file on the Windws machine.. I am pretty confused in this regards.. please advice...

If created , will I be able to use the hashed file in lookups as normal hashed file or some other care will be reqd ??

chulett · Post by **chulett** » Fri Mar 09, 2007 9:06 am

kumar_s wrote:I might be hijacking the thread now. But Iam designing one such job with Database lookup. Large volume with 4 keys. For lookup purpose, Iam going to unload only the new created artificial key. I felt its saving space as well as more efficient. Hence going for synthetic key. Would you recommend that.

If 'synthetic' means 'generated using CRC32' - don't. Search for posts by mhester mentioning CRC32 - he wrote the utility for Ascential and has very firm opinions on that subject.

kumar_s · Post by **kumar_s** » Fri Mar 09, 2007 9:19 am

Yeah, I think even I was part of one discussion. Infact I was the the victim. Where my comments where criticized to hell. I am curious, if CRC32 doesn't fit for this option, where else it been used.

ray.wurlod · Post by **ray.wurlod** » Fri Mar 09, 2007 10:14 am

Use the Hashed File Calculator to tune the mkdbfile command. This utility is on your DataStage clients CD in the Utilities\Unsupported folder.

asitagrawal · Post by **asitagrawal** » Fri Mar 09, 2007 10:16 am

ray.wurlod wrote:Use the Hashed File Calculator to tune the mkdbfile command. This utility is on your DataStage clients CD in the Utilities\Unsupported folder. ...

.

Dear Ray,

I think this is my weak area.. using HFC...
coz I dont have a clear understanding of how to determine the record size... which is an important parameter for HFC...

ray.wurlod · Post by **ray.wurlod** » Fri Mar 09, 2007 10:22 am

Total all the characters in a row and add 1 for each column. There are no data types - numbers are stored as character strings.