64bit hased file bad performance

DSguru2B · Post by **DSguru2B** » Fri Jun 15, 2007 9:05 am

Thats enough information Ken. As always, you rock, and we learn.
Regards,

gsym · Post by **gsym** » Fri Jun 15, 2007 10:03 am

Thank you All,
iam working on a 2 CPU peoplesoft EPM.

ray.wurlod · Post by **ray.wurlod** » Fri Jun 15, 2007 3:18 pm

chulett wrote:How much benefit is there to using a Distributed Hashed File in this situation? I've got a similar issue where multiple copies of a particular hashed file are being created by a MI job that mod's the input stream and distributes the result across the X hashed files.

However, the final job that consumes them doesn't currently follow the same rules for some reason, so ends up will all of them in it. Meaning they all get looked up against in spite of the known fact that only one will get a hit. Of course, it 'adversely impacts' the processing speed and the all important rows/second metric goes into the toilet.

I've been exploring re-arranging everything to do exactly what Ken stated (in my spare time, ha!) but with the reminder that DHF exist, I'm wondering if it might be a 'quickie' way to get this back to 'one lookup' in the interim. Is it worth considering? I haven't looked at them in detail for three years, not since I sat through a presentation at the 2004 Ascential World on the subject. Pros/Cons?

Somehow, some way, you're going to have to hit them all. But with a DF you only hit the appropriate one for each record. I think, on balance, that's a "pro".

To define the DF there must be some unique characteristic of the key value that identifies which of the part files it belongs in - the Mod() that you suggest would be ideal, as it is cheaply calculated.

The alternative is a huge hashed file, and the time hit needed to load it. I think that's a "pro" for DF as well.

The major "con" is that you're precluded from using hashed file cache.

chulett · Post by **chulett** » Sat Jun 16, 2007 7:02 am

Thanks Ray.

And from what I recall - precluded regardless of actual hashed file size, yes?

ray.wurlod · Post by **ray.wurlod** » Sat Jun 16, 2007 3:40 pm

That is correct. But why would you want a DF with less than 999MB of reference data?

chulett · Post by **chulett** » Sat Jun 16, 2007 6:14 pm

Just wanted a clarification, is all.

ray.wurlod · Post by **ray.wurlod** » Sat Jun 16, 2007 8:15 pm

That's easy. Buy yourself some ghee.

chulett · Post by **chulett** » Sat Jun 16, 2007 10:13 pm