Cartesian Join in PX

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ann_nalinee
Premium Member
Premium Member
Posts: 22
Joined: Wed Sep 17, 2003 12:21 pm
Location: Sydney

Cartesian Join in PX

Post by ann_nalinee »

Hi Gurus,

How could I perform cartesian join in Datastage PX? My requirement is to get the cartesian join from multiple input files.

I have tried to use Join Stage to do it, but it seems that for Full Outer Join it allows only 2 input links. And I could not perform cartesian join in Database either.

If you have any idea how to perform cartesian join in PX, please suggest.

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

The Join stage will perform a Cartesian products for Full Outer Join if there are duplicates on the Left input.

You are right that a maximum of two inputs is specified for Full Outer Join.

Use more than one Join stage in a cascade. For example for three inputs you would use two Join stages, the second performing a Join onto the result of the first.

Code: Select all

    Source1 ------>
                     Join1  ----->
    Source2 ------>                Join2  ----->
    Source3 --------------------->
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ann_nalinee
Premium Member
Premium Member
Posts: 22
Joined: Wed Sep 17, 2003 12:21 pm
Location: Sydney

Post by ann_nalinee »

Hi Rays,

Thanks for quick reply.

I have thought about that way as well. But I have 7 source files to perform cartesian join. This job would be massive.

But if this is only last choice, I would have to go this way.

Cheers,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A seven way Cartesian join is going to be massive no matter how you undertake it. With only 10 rows per input, you're looking at 10 million rows (10^7) on the output.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Nagin
Charter Member
Charter Member
Posts: 89
Joined: Thu Jan 26, 2006 12:37 pm

Post by Nagin »

Hey Ann,
This topic is marked as having a work around. Do you mind to share how you resolved this issue? I have two files with around 1500 disctinct values (including the non key columns) in both.
Here is the lay out.

file-1
x 1
x 2
x 3
x 4
x 5

file -2

x null 12 34
x null 13 24
x null 11 22
x null 45 23
x null 34 56

My job has to get the cartesian product, the second column values in the file-1 has to replace the null in file-2 for all the columns,

the expected out put is like this

x 1 12 34
x 1 13 24
x 1 11 22
x 1 45 23
x 1 34 56
x 2 12 34
x 2 13 24
x 2 11 22
x 2 45 23
x 2 34 56
x 3 12 34
x 3 13 24
x 3 11 22
x 3 45 23
x 3 34 56
etc...

Thanks for your help.
ann_nalinee
Premium Member
Premium Member
Posts: 22
Joined: Wed Sep 17, 2003 12:21 pm
Location: Sydney

Post by ann_nalinee »

Hi Nagin,

Finally, we end up with loading the source files into temporary tables and perform cartesian join in Database.

Cheers,
sjchanda
Participant
Posts: 1
Joined: Thu Mar 17, 2011 8:09 am

Cross join simplest option

Post by sjchanda »

To do a cross join with two files add a dummy column to both the files say "key" with a value of 1 for all rows. Then use join stage in inner join mode. For more than two files repeat the process with more such key columns and cascading inner joins.
Post Reply