64bit hased file bad performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Thats enough information Ken. As always, you rock, and we learn.
Regards,
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
gsym
Charter Member
Charter Member
Posts: 118
Joined: Thu Feb 02, 2006 3:05 pm

Post by gsym »

Thank you All,
iam working on a 2 CPU peoplesoft EPM.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

chulett wrote:How much benefit is there to using a Distributed Hashed File in this situation? I've got a similar issue where multiple copies of a particular hashed file are being created by a MI job that mod's the input stream and distributes the result across the X hashed files.

However, the final job that consumes them doesn't currently follow the same rules for some reason, so ends up will all of them in it. Meaning they all get looked up against in spite of the known fact that only one will get a hit. Of course, it 'adversely impacts' the processing speed and the all important rows/second metric goes into the toilet. :evil: :lol:

I've been exploring re-arranging everything to do exactly what Ken stated (in my spare time, ha!) but with the reminder that DHF exist, I'm wondering if it might be a 'quickie' way to get this back to 'one lookup' in the interim. Is it worth considering? I haven't looked at them in detail for three years, not since I sat through a presentation at the 2004 Ascential World on the subject. Pros/Cons?
Somehow, some way, you're going to have to hit them all. But with a DF you only hit the appropriate one for each record. I think, on balance, that's a "pro".

To define the DF there must be some unique characteristic of the key value that identifies which of the part files it belongs in - the Mod() that you suggest would be ideal, as it is cheaply calculated.

The alternative is a huge hashed file, and the time hit needed to load it. I think that's a "pro" for DF as well.

The major "con" is that you're precluded from using hashed file cache.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Thanks Ray.

And from what I recall - precluded regardless of actual hashed file size, yes?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That is correct. But why would you want a DF with less than 999MB of reference data?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Just wanted a clarification, is all.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's easy. Buy yourself some ghee.
:lol:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:roll: :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply