Hashed File Problems with large record set
Moderators: chulett, rschirm, roy
Hashed File Problems with large record set
Hello,
I have a question regarding a possible limitation in hashed files.
We have a job that performs a lookup on a hashed file that was tested to function ok with small amounts of data. The same lookup failed to return any values when the hashed file contained 500,000 records.
The Debug mode showed all records as containing NULL values (this is incorrect) when the hash file was so large.
Pre-load to memory was set to disabled and there are no disk issues.
Does anyone have any ideas about this behaviour?
thanks,
SPA
I have a question regarding a possible limitation in hashed files.
We have a job that performs a lookup on a hashed file that was tested to function ok with small amounts of data. The same lookup failed to return any values when the hashed file contained 500,000 records.
The Debug mode showed all records as containing NULL values (this is incorrect) when the hash file was so large.
Pre-load to memory was set to disabled and there are no disk issues.
Does anyone have any ideas about this behaviour?
thanks,
SPA
from SPA_BI
When you say "Failed to return any values", you mean that the none of the source records had a hit on the hashed file?
Hashed files have size limitations. 2.2 GB is the limit. But this barrier can be overcome by making it a 64 bit hashed file. Search the forum for more information on "how to".
Make sure your hashed file keys are trimmed and the source keys are also trimmed before doing the lookup.
Hashed files have size limitations. 2.2 GB is the limit. But this barrier can be overcome by making it a 64 bit hashed file. Search the forum for more information on "how to".
Make sure your hashed file keys are trimmed and the source keys are also trimmed before doing the lookup.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
thanks for the reply.
I should have pointed out that the hash file is a lookup of itself (after an aggregation stage); so there are no issues with the key matching up.
It works well when i Cull the amount of records down but it seems like the sheer amount of records causes the lookup to fail.
So without the 2.2 G limit being reached, I'm wondering why the lookup works with a few of the records but not the large amount.
Could resource demands on the server result in this behaviour?
I should have pointed out that the hash file is a lookup of itself (after an aggregation stage); so there are no issues with the key matching up.
It works well when i Cull the amount of records down but it seems like the sheer amount of records causes the lookup to fail.
So without the 2.2 G limit being reached, I'm wondering why the lookup works with a few of the records but not the large amount.
Could resource demands on the server result in this behaviour?
from SPA_BI
I don't know about anyone else, but I'd appreciate a clarification as to what this statement means.SPA_BI wrote:The hash file is a lookup of itself.
The 'trim' would help get past the classic 'lookup doesn't work' problem - the keys don't match because of extraneous whitespace in one or both values. People load "A" from one source and try to match it to "A " from another.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Please describe your job more completely. In particular what is the data type of the lookup key, what is the data type of that column coming out of the Aggregator stage, and what happens to the column in the Aggregator stage (is it grouped or does it have an aggregate function applied to it)? Mention also whether you have read cache and/or write cache enabled in the Hashed File stages.
Last edited by ray.wurlod on Mon Jan 29, 2007 2:15 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
My thought there is that you might have buffered writes turned on, so that a lookup of a record might not return a value because it hasn't actually been written to the file yet. This could happen more frequently when a lot of writes are done and are buffered. As stated earlier, a more detailed description might help clarify that this is not the problem.SPA_BI wrote:...The hash file is a lookup of itself...
If you attempt to write past the default 2Gb limit (since dynamic files are stored in 2 OS files with most data being written to one file the actual limit is not predictable and is slightly over 2Gb) you will get write errors and most likely a corrupted file. With 500,000 records you would need an average record length of over 4096 bytes to exceed 2Gb - is this the case?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I highly doubt its the 2.2 Gigs limit issue here ArndW. As you noted, the job would abort and atleast spit out a message in the log file. None of that is happening. I think your analysis about "buffered write" might be it. Lets see what the OP comes back with.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact: