which is faster hash file join or parallel join stage?

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Pavan_Yelugula
Premium Member
Premium Member
Posts: 133
Joined: Tue Nov 23, 2004 11:24 pm
Location: India

which is faster hash file join or parallel join stage?

Post by Pavan_Yelugula »

hi All
Can you tell me which one of two is faster the hash file look up of server or the join stage of parallel extender.

Consider that i do the hash file look up on my server canvas put it in a shared container and execute on parallel extender will this make my job faster than just using a join stage on my parallel extender

it would be great if u can also tell me which one is better or why one is faster than the other

Thanks and regards
Pavan
ogmios
Participant
Posts: 659
Joined: Tue Mar 11, 2003 3:40 pm

Re: which is faster hash file join or parallel join stage?

Post by ogmios »

Not so good on PX, but I doubt you can use hash file lookup in PX jobs. Stay within same groups of functionality, so use a join stage. Or do the join directly in the database :wink:

Ogmios
In theory there's no difference between theory and practice. In practice there is.
T42
Participant
Posts: 499
Joined: Thu Nov 11, 2004 6:45 pm

Post by T42 »

If you could use Lookup (i.e. if your data is smaller than the amount of memory you have), use it! It is VERY fast, especially on multiple node processing.

Join stage force a sort on your data whether you ask for it or not (it's done within the framework, see your help guide on Join Stage). However, it is plenty fast, especially on multiple nodes (hash files are "1 node" -- a 16 node join would definitely be faster on a 16 cpu box).
alexysflores
Participant
Posts: 18
Joined: Mon Jan 12, 2004 7:20 am
Location: USA

Re: which is faster hash file join or parallel join stage?

Post by alexysflores »

[quote="Pavan_Yelugula"]hi All
Can you tell me which one of two is faster the hash file look up of server or the join stage of parallel extender.

Consider that i do the hash file look up on my server canvas put it in a shared container and execute on parallel extender will this make my job faster than just using a join stage on my parallel extender

it would be great if u can also tell me which one is better or why one is faster than the other

Thanks and regards
Pavan[/quote]

I think you're looking for problematic issues with your job design a join stage forces a sort issue, typically its a a hog on resources both for PX and Server jobs. The algorithm sort process for DataStage are not so great.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
By definition and availability of resources the Enterprise edition (PX) should give better performance, again as long as resources for paralel execution are available.

If you have a situation where this is not true, you probably don't have your system configured correctly;
or a job that is so small in data volume terms it needs not the PX engine.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Post Reply