Page 1 of 1

which is faster hash file join or parallel join stage?

Posted: Fri Dec 31, 2004 2:51 am
by Pavan_Yelugula
hi All
Can you tell me which one of two is faster the hash file look up of server or the join stage of parallel extender.

Consider that i do the hash file look up on my server canvas put it in a shared container and execute on parallel extender will this make my job faster than just using a join stage on my parallel extender

it would be great if u can also tell me which one is better or why one is faster than the other

Thanks and regards
Pavan

Re: which is faster hash file join or parallel join stage?

Posted: Fri Dec 31, 2004 5:42 am
by ogmios
Not so good on PX, but I doubt you can use hash file lookup in PX jobs. Stay within same groups of functionality, so use a join stage. Or do the join directly in the database :wink:

Ogmios

Posted: Fri Dec 31, 2004 9:30 pm
by T42
If you could use Lookup (i.e. if your data is smaller than the amount of memory you have), use it! It is VERY fast, especially on multiple node processing.

Join stage force a sort on your data whether you ask for it or not (it's done within the framework, see your help guide on Join Stage). However, it is plenty fast, especially on multiple nodes (hash files are "1 node" -- a 16 node join would definitely be faster on a 16 cpu box).

Re: which is faster hash file join or parallel join stage?

Posted: Sat Jan 01, 2005 2:18 pm
by alexysflores
[quote="Pavan_Yelugula"]hi All
Can you tell me which one of two is faster the hash file look up of server or the join stage of parallel extender.

Consider that i do the hash file look up on my server canvas put it in a shared container and execute on parallel extender will this make my job faster than just using a join stage on my parallel extender

it would be great if u can also tell me which one is better or why one is faster than the other

Thanks and regards
Pavan[/quote]

I think you're looking for problematic issues with your job design a join stage forces a sort issue, typically its a a hog on resources both for PX and Server jobs. The algorithm sort process for DataStage are not so great.

Posted: Sat Jan 01, 2005 5:55 pm
by roy
Hi,
By definition and availability of resources the Enterprise edition (PX) should give better performance, again as long as resources for paralel execution are available.

If you have a situation where this is not true, you probably don't have your system configured correctly;
or a job that is so small in data volume terms it needs not the PX engine.

IHTH,