Parallel Extraction - non-collocated joins

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Parallel Extraction - non-collocated joins

Post by vigneshra »

Hello Everyone,

I have a question here.

Say, we are having 2 huge tables (each having around 700 millions of records). One table is partitioned on column1 where as another table is partitioned on column2. Is it advisable to do a parallel extraction from a query that joins these two tables (which is essentially a non-collocated join) using node number clause containing partitioning key of any one of the two tables? Will it not create any problem? Please clarify! :roll:

(P.S. I cannot do this join in ETL as the number of records are very huge. Is there any work around for this?)
Vignesh.

"A conclusion is simply the place where you got tired of thinking."
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can't be done unless the data are partitioned on the join key(s) using the same partitioning algorithm or the Entire partitioning algorithm for the probe table. Otherwise you can't be certain that you will find the row in the probe table on the partition where the driver row exists.

Given that the probe table is large, you probably don't want to use the Entire partitioning algorithm, which puts every row from the probe table onto every partition.

You're pretty much limited to an unpartitioned join - or a database re-design - given that the tables are partitioned differently.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vigneshra
Participant
Posts: 86
Joined: Wed Jun 09, 2004 6:07 am
Location: Chennai

Post by vigneshra »

Ray,

So you mean to say that we need to repartition the table to make collocated. Or otherwise we need to create one more table which is partitioned on the required keys and then move all the records from the original table to this table and do a join. Am i right?
Vignesh.

"A conclusion is simply the place where you got tired of thinking."
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Either of those options will do it, or using Entire partitioning. Make sure you throw large hardware at it in the latter case.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply