Hashed File Problems with large record set

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
SPA_BI
Premium Member
Premium Member
Posts: 36
Joined: Tue Aug 29, 2006 8:01 pm
Location: Melbourne

Hashed File Problems with large record set

Post by SPA_BI »

Hello,

I have a question regarding a possible limitation in hashed files.

We have a job that performs a lookup on a hashed file that was tested to function ok with small amounts of data. The same lookup failed to return any values when the hashed file contained 500,000 records.
The Debug mode showed all records as containing NULL values (this is incorrect) when the hash file was so large.

Pre-load to memory was set to disabled and there are no disk issues.

Does anyone have any ideas about this behaviour?

thanks,

SPA
from SPA_BI
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

When you say "Failed to return any values", you mean that the none of the source records had a hit on the hashed file?

Hashed files have size limitations. 2.2 GB is the limit. But this barrier can be overcome by making it a 64 bit hashed file. Search the forum for more information on "how to".

Make sure your hashed file keys are trimmed and the source keys are also trimmed before doing the lookup.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
SPA_BI
Premium Member
Premium Member
Posts: 36
Joined: Tue Aug 29, 2006 8:01 pm
Location: Melbourne

Post by SPA_BI »

thanks for the reply.

I should have pointed out that the hash file is a lookup of itself (after an aggregation stage); so there are no issues with the key matching up.
It works well when i Cull the amount of records down but it seems like the sheer amount of records causes the lookup to fail.

So without the 2.2 G limit being reached, I'm wondering why the lookup works with a few of the records but not the large amount.
Could resource demands on the server result in this behaviour?
from SPA_BI
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

Do you get any errors/waranings at all, or its just that the look ups are failing?
As DSguru2B suggested, do a trim before you do a lookup.
If possible share the design of your job.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
SPA_BI
Premium Member
Premium Member
Posts: 36
Joined: Tue Aug 29, 2006 8:01 pm
Location: Melbourne

Post by SPA_BI »

thanks for reply,

There are no warnings/errors

The hash file is a lookup of itself. It works fine with a smaller set (even using records from the larger set), but when the whole 500,000 is pushed though, the lookup fails to match up any of the records.

How would the trim be of benefit?
from SPA_BI
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

SPA_BI wrote:The hash file is a lookup of itself.
I don't know about anyone else, but I'd appreciate a clarification as to what this statement means.

The 'trim' would help get past the classic 'lookup doesn't work' problem - the keys don't match because of extraneous whitespace in one or both values. People load "A" from one source and try to match it to "A " from another.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please describe your job more completely. In particular what is the data type of the lookup key, what is the data type of that column coming out of the Aggregator stage, and what happens to the column in the Aggregator stage (is it grouped or does it have an aggregate function applied to it)? Mention also whether you have read cache and/or write cache enabled in the Hashed File stages.
Last edited by ray.wurlod on Mon Jan 29, 2007 2:15 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

SPA_BI wrote:...The hash file is a lookup of itself...
My thought there is that you might have buffered writes turned on, so that a lookup of a record might not return a value because it hasn't actually been written to the file yet. This could happen more frequently when a lot of writes are done and are buffered. As stated earlier, a more detailed description might help clarify that this is not the problem.

If you attempt to write past the default 2Gb limit (since dynamic files are stored in 2 OS files with most data being written to one file the actual limit is not predictable and is slightly over 2Gb) you will get write errors and most likely a corrupted file. With 500,000 records you would need an average record length of over 4096 bytes to exceed 2Gb - is this the case?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

I highly doubt its the 2.2 Gigs limit issue here ArndW. As you noted, the job would abort and atleast spit out a message in the log file. None of that is happening. I think your analysis about "buffered write" might be it. Lets see what the OP comes back with.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
SPA_BI
Premium Member
Premium Member
Posts: 36
Joined: Tue Aug 29, 2006 8:01 pm
Location: Melbourne

Post by SPA_BI »

Thanks for the help guys, it was a wrong aggregation being performed on the key.

Closing request.
from SPA_BI
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Thanks for the feedback. We were fairly sure the problem wasn't in the hashed file itself.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply