Parallel Extraction - non-collocated joins

vigneshra · Post by **vigneshra** » Mon Oct 17, 2005 3:33 am

Hello Everyone,

I have a question here.

Say, we are having 2 huge tables (each having around 700 millions of records). One table is partitioned on column1 where as another table is partitioned on column2. Is it advisable to do a parallel extraction from a query that joins these two tables (which is essentially a non-collocated join) using node number clause containing partitioning key of any one of the two tables? Will it not create any problem? Please clarify!

(P.S. I cannot do this join in ETL as the number of records are very huge. Is there any work around for this?)

ray.wurlod · Post by **ray.wurlod** » Mon Oct 17, 2005 3:49 pm

Can't be done unless the data are partitioned on the join key(s) using the same partitioning algorithm or the Entire partitioning algorithm for the probe table. Otherwise you can't be certain that you will find the row in the probe table on the partition where the driver row exists.

Given that the probe table is large, you probably don't want to use the Entire partitioning algorithm, which puts every row from the probe table onto every partition.

You're pretty much limited to an unpartitioned join - or a database re-design - given that the tables are partitioned differently.

vigneshra · Post by **vigneshra** » Wed Oct 19, 2005 1:45 am

Ray,

So you mean to say that we need to repartition the table to make collocated. Or otherwise we need to create one more table which is partitioned on the required keys and then move all the records from the original table to this table and do a join. Am i right?

ray.wurlod · Post by **ray.wurlod** » Wed Oct 19, 2005 4:09 pm

Either of those options will do it, or using Entire partitioning. Make sure you throw large hardware at it in the latter case.