Page 1 of 1

Parallel Extraction - non-collocated joins

Posted: Mon Oct 17, 2005 3:33 am
by vigneshra
Hello Everyone,

I have a question here.

Say, we are having 2 huge tables (each having around 700 millions of records). One table is partitioned on column1 where as another table is partitioned on column2. Is it advisable to do a parallel extraction from a query that joins these two tables (which is essentially a non-collocated join) using node number clause containing partitioning key of any one of the two tables? Will it not create any problem? Please clarify! :roll:

(P.S. I cannot do this join in ETL as the number of records are very huge. Is there any work around for this?)

Posted: Mon Oct 17, 2005 3:49 pm
by ray.wurlod
Can't be done unless the data are partitioned on the join key(s) using the same partitioning algorithm or the Entire partitioning algorithm for the probe table. Otherwise you can't be certain that you will find the row in the probe table on the partition where the driver row exists.

Given that the probe table is large, you probably don't want to use the Entire partitioning algorithm, which puts every row from the probe table onto every partition.

You're pretty much limited to an unpartitioned join - or a database re-design - given that the tables are partitioned differently.

Posted: Wed Oct 19, 2005 1:45 am
by vigneshra
Ray,

So you mean to say that we need to repartition the table to make collocated. Or otherwise we need to create one more table which is partitioned on the required keys and then move all the records from the original table to this table and do a join. Am i right?

Posted: Wed Oct 19, 2005 4:09 pm
by ray.wurlod
Either of those options will do it, or using Entire partitioning. Make sure you throw large hardware at it in the latter case.