Page 1 of 1

Avoid Cartesian Products

Posted: Thu Feb 21, 2008 5:45 am
by hitmanthesilentassasin
Hi,

I have a requirement where i have to map 1:1 relationship between 2 tables. below is the example of the data

Table1
--------------------------
Col1 || Col2 || Col3
1 || A || 0608
1 || B || 0708
1 || C || 0808

Table 2
--------------------------
Col1 || Col2 || Col3
1 || A || 0615
1 || B || 0715
1 || C || 0915

From the above mentioned table I want the data to be joined using Col1 only, But, first row of the first table should be joined with the first row of the second table and second row of the first table should be joined with the second row of the second table. The solution I could think of is using a join stage that will give me the cartesian product and then filtering the data. Since, the volume of the data is way too high I wanted to avoid the cartesian prodcut. Any comments or suggestions would be appreciated.

Thanks,
Waseem

Posted: Thu Feb 21, 2008 6:30 am
by ray.wurlod
Put a Column Generator stage on each input to the Join stage, and use it to generate an integer sequence. Allow appropriately for partitioning if running in parallel. Make sure you get the same sequence on each input. Then use that as your join key.

Posted: Thu Feb 21, 2008 7:20 am
by hitmanthesilentassasin
But that will join me the first row of table 1 with the first row of table 2 and second row of table 2 with second row of table 2. Look at the below example for more details:

Table1
--------------------------
Col1 || Col2 || Col3
1 || A || 0608
1 || B || 0708
1 || C || 0808


Table 2
--------------------------
Col1 || Col2 || Col3
1 || A || 0615
1 || C || 0915
1 || B || 0715

In the above example the sequence of the second row is changed but the second row of the table 1 should be matched with the third row of the table 2.

Posted: Thu Feb 21, 2008 8:13 am
by chulett
:? You can only know that if you join on the two key columns not one. Otherwise it will need to be purely positional.

Posted: Thu Feb 21, 2008 8:31 am
by hitmanthesilentassasin
It can neither be positional nor on both the keys. One way is to join on the first key and then filter out the data but by doing so i would have the cartesian product created and then the data would be filtered out. So, I want to avoid the cartesian product.

Posted: Thu Feb 21, 2008 8:41 am
by chulett
You "avoid the cartesian product" by joining on the proper keys. Why in the world can you not simply do that here? :?

If you "can't do that" for some reason, I really don't see how you can avoid the cartesian-then-filter approach. However, you could arrange for all of that to happen in the source sql, I would imagine.
But that will join me the first row of table 1 with the first row of table 2 and second row of table 1 with second row of table 2.
I also wanted to point out that Ray gave you the solution that he did because this is exactly what you asked for in the original post.

Posted: Thu Feb 21, 2008 11:27 pm
by hitmanthesilentassasin
Thanks for the replies. It seems I am not sure in my first post.

Posted: Thu Feb 21, 2008 11:58 pm
by ray.wurlod
You either want Cartesian products, as one reading shows, or you don't, as another reading shows. But there is no way to "avoid" Cartesian product if what you need is the result of a Cartesian product.