which is faster hash file join or parallel join stage?

Pavan_Yelugula · Post by **Pavan_Yelugula** » Fri Dec 31, 2004 2:51 am

hi All
Can you tell me which one of two is faster the hash file look up of server or the join stage of parallel extender.

Consider that i do the hash file look up on my server canvas put it in a shared container and execute on parallel extender will this make my job faster than just using a join stage on my parallel extender

it would be great if u can also tell me which one is better or why one is faster than the other

Thanks and regards
Pavan

ogmios · Post by **ogmios** » Fri Dec 31, 2004 5:42 am

Not so good on PX, but I doubt you can use hash file lookup in PX jobs. Stay within same groups of functionality, so use a join stage. Or do the join directly in the database

Ogmios

T42 · Post by **T42** » Fri Dec 31, 2004 9:30 pm

If you could use Lookup (i.e. if your data is smaller than the amount of memory you have), use it! It is VERY fast, especially on multiple node processing.

Join stage force a sort on your data whether you ask for it or not (it's done within the framework, see your help guide on Join Stage). However, it is plenty fast, especially on multiple nodes (hash files are "1 node" -- a 16 node join would definitely be faster on a 16 cpu box).

alexysflores · Post by **alexysflores** » Sat Jan 01, 2005 2:18 pm

[quote="Pavan_Yelugula"]hi All
Can you tell me which one of two is faster the hash file look up of server or the join stage of parallel extender.

Consider that i do the hash file look up on my server canvas put it in a shared container and execute on parallel extender will this make my job faster than just using a join stage on my parallel extender

it would be great if u can also tell me which one is better or why one is faster than the other

Thanks and regards
Pavan[/quote]

I think you're looking for problematic issues with your job design a join stage forces a sort issue, typically its a a hog on resources both for PX and Server jobs. The algorithm sort process for DataStage are not so great.

roy · Post by **roy** » Sat Jan 01, 2005 5:55 pm

Hi,
By definition and availability of resources the Enterprise edition (PX) should give better performance, again as long as resources for paralel execution are available.

If you have a situation where this is not true, you probably don't have your system configured correctly;
or a job that is so small in data volume terms it needs not the PX engine.

IHTH,

DSXchange

which is faster hash file join or parallel join stage?

which is faster hash file join or parallel join stage?

Re: which is faster hash file join or parallel join stage?

Re: which is faster hash file join or parallel join stage?