Slow Writing to Hashed files

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Sorry, I did not expand on the idea that I had earlier posted about using CRC32. I just assumed you would know about it. Anyways, Kumar has already explained that you would need to compute a CRC32 on the concatenated keys, both on the input stream and the reference and perform the match.

Can you please let us know if it helped? But of course, the 'cache flush' issue pointed out by Kenneth would mean that it may not help you significantly.

gateleys
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

With write cache disabled you will (I predict) see a slower load rate (than with write cache enabled) that doesn't decline suddenly at the end.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

kumar_s wrote:You are right, CRC32 will give you unique single field value for combination of several fields. So you can reduce the over head of doing the lookup for all the fields. But you need to make sure, while preparing the CRC field, the datatype and length should be same, else you will end up in getting different value and lookup mismatch.
I wouldn't even consider an approach like this, it's not an appropriate use of CRC32... in my opinion.

Besides, you've said your problem is with writing to the hashed files, not reading from them. And you've said you only have one key field in the hashed files - not four - and you use @INROWNUM for the key.

Besides the fact that I have no idea how you will manage to use a hashed file that has a row number as a key, your issue as stated (job is not finishing writing quickly to the hashed files) has got nothing to do with your keys. It is the fact that you have Write Cache enabled and you probably haven't sized your hashed files properly.
trobinson wrote:Have you analyzed the hashed files' structure? The Modulo and separation? Does your Create File being checked still create the default dynamic hashed files or have you tuned it?

Did you ever answer these questions? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Craig, though its not the right use of CRC32, wont it be efficient to follow that approach. Do you suspect the chance of repeatable value would cause some error.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

trobinson wrote:Have you analyzed the hashed files' structure? The Modulo and separation? Does your Create File being checked still create the default dynamic hashed files or have you tuned it?
this is the command which is creating the Hashed file:

Code: Select all

D:\Ascential\DataStage\Engine\bin\mkdbfile E:\Input\LineKeys_RowNum_3 30 1 4 20 50 80 1628 
I have not tuned the dommand for creating the Hashed file.... Plz suggest as to how may I attempt to tune it.
chulett wrote:Besides the fact that I have no idea how you will manage to use a hashed file that has a row number as a key,
Regarding this, the reason to go for such approach is to reduce the number of fields in a hashed file, coz the hashed file was crossing 2GB limit ( when 8 keys + 8 data flds were written into a single hashed file )... so I stored the 8 keys along with the @INROWNUM in one hashed file and in the remaining Linex_INSTANCE hashed file, the @INROWNUM is key and the rest of the 8 data flds are put there....

chulett wrote:your issue as stated (job is not finishing writing quickly to the hashed files) has got nothing to do with your keys. It is the fact that you have Write Cache enabled and you probably haven't sized your hashed files properly.
Regarding this, I request some guidance as to how may I size and tune the Hashed file propery....
Share to Learn, and Learn to Share.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Why even start a discussion about a composite key? The OP has stated the hashed files all have a single key field.

I would simply concatenate the multiple fields together if I wanted to build a composite key. But again, I wouldn't bother for 4 key fields, that's not going to help with build speed of the hashed file. And I doubt it would substantially affect the lookup speed, either. I recall doing that in older versions, but haven't seen a need to even consider it in the last several years.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

I might be hijacking the thread now. :roll: But Iam designing one such job with Database lookup. Large volume with 4 keys. For lookup purpose, Iam going to unload only the new created artificial key. I felt its saving space as well as more efficient. Hence going for synthetic key. Would you recommend that.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Remove the hashed files from disk, set the minimum modulus, turn off write-delay cache. Now run your job.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah... so now the complete picture of what you are doing is starting to emerge. I don't have time for a good answer here, need to get my butt out the door.

Your Minimum Modulus of 1 (the default) for the hashed files is way too low for your volume. And breaking this one file into two because of the 2GB issue is the hard way to 'fix' the problem, why not simply use a 64BIT hashed file?

I'll leave it to others to pass on the tuning advice.

You also mentioned something about 4 keys, not 8. Please don't tell me you are using 8 to build the hashed file and then 4 for the lookup...
Last edited by chulett on Fri Mar 09, 2007 9:03 am, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

Here some numbers from the performance statistics:

http://spreadsheets.google.com/pub?key= ... N11Rci_7Mg

Will I be able to use the 64Bit Hashed file on the Windws machine.. I am pretty confused in this regards.. please advice...

If created , will I be able to use the hashed file in lookups as normal hashed file or some other care will be reqd ??
Share to Learn, and Learn to Share.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

kumar_s wrote:I might be hijacking the thread now. :roll: But Iam designing one such job with Database lookup. Large volume with 4 keys. For lookup purpose, Iam going to unload only the new created artificial key. I felt its saving space as well as more efficient. Hence going for synthetic key. Would you recommend that.
If 'synthetic' means 'generated using CRC32' - don't. Search for posts by mhester mentioning CRC32 - he wrote the utility for Ascential and has very firm opinions on that subject.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Yeah, I think even I was part of one discussion. Infact I was the the victim. Where my comments where criticized to hell. I am curious, if CRC32 doesn't fit for this option, where else it been used.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Use the Hashed File Calculator to tune the mkdbfile command. This utility is on your DataStage clients CD in the Utilities\Unsupported folder.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
asitagrawal
Premium Member
Premium Member
Posts: 273
Joined: Wed Oct 18, 2006 12:20 pm
Location: Porto

Post by asitagrawal »

ray.wurlod wrote:Use the Hashed File Calculator to tune the mkdbfile command. This utility is on your DataStage clients CD in the Utilities\Unsupported folder. ...
.

Dear Ray,

I think this is my weak area.. using HFC...
coz I dont have a clear understanding of how to determine the record size... which is an important parameter for HFC... :(
Share to Learn, and Learn to Share.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Total all the characters in a row and add 1 for each column. There are no data types - numbers are stored as character strings.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply