Avoid Cartesian Products
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 150
- Joined: Tue Mar 13, 2007 1:17 am
Avoid Cartesian Products
Hi,
I have a requirement where i have to map 1:1 relationship between 2 tables. below is the example of the data
Table1
--------------------------
Col1 || Col2 || Col3
1 || A || 0608
1 || B || 0708
1 || C || 0808
Table 2
--------------------------
Col1 || Col2 || Col3
1 || A || 0615
1 || B || 0715
1 || C || 0915
From the above mentioned table I want the data to be joined using Col1 only, But, first row of the first table should be joined with the first row of the second table and second row of the first table should be joined with the second row of the second table. The solution I could think of is using a join stage that will give me the cartesian product and then filtering the data. Since, the volume of the data is way too high I wanted to avoid the cartesian prodcut. Any comments or suggestions would be appreciated.
Thanks,
Waseem
I have a requirement where i have to map 1:1 relationship between 2 tables. below is the example of the data
Table1
--------------------------
Col1 || Col2 || Col3
1 || A || 0608
1 || B || 0708
1 || C || 0808
Table 2
--------------------------
Col1 || Col2 || Col3
1 || A || 0615
1 || B || 0715
1 || C || 0915
From the above mentioned table I want the data to be joined using Col1 only, But, first row of the first table should be joined with the first row of the second table and second row of the first table should be joined with the second row of the second table. The solution I could think of is using a join stage that will give me the cartesian product and then filtering the data. Since, the volume of the data is way too high I wanted to avoid the cartesian prodcut. Any comments or suggestions would be appreciated.
Thanks,
Waseem
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Put a Column Generator stage on each input to the Join stage, and use it to generate an integer sequence. Allow appropriately for partitioning if running in parallel. Make sure you get the same sequence on each input. Then use that as your join key.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 150
- Joined: Tue Mar 13, 2007 1:17 am
But that will join me the first row of table 1 with the first row of table 2 and second row of table 2 with second row of table 2. Look at the below example for more details:
Table1
--------------------------
Col1 || Col2 || Col3
1 || A || 0608
1 || B || 0708
1 || C || 0808
Table 2
--------------------------
Col1 || Col2 || Col3
1 || A || 0615
1 || C || 0915
1 || B || 0715
In the above example the sequence of the second row is changed but the second row of the table 1 should be matched with the third row of the table 2.
Table1
--------------------------
Col1 || Col2 || Col3
1 || A || 0608
1 || B || 0708
1 || C || 0808
Table 2
--------------------------
Col1 || Col2 || Col3
1 || A || 0615
1 || C || 0915
1 || B || 0715
In the above example the sequence of the second row is changed but the second row of the table 1 should be matched with the third row of the table 2.
-
- Participant
- Posts: 150
- Joined: Tue Mar 13, 2007 1:17 am
You "avoid the cartesian product" by joining on the proper keys. Why in the world can you not simply do that here?
If you "can't do that" for some reason, I really don't see how you can avoid the cartesian-then-filter approach. However, you could arrange for all of that to happen in the source sql, I would imagine.
![Confused :?](./images/smilies/icon_confused.gif)
If you "can't do that" for some reason, I really don't see how you can avoid the cartesian-then-filter approach. However, you could arrange for all of that to happen in the source sql, I would imagine.
I also wanted to point out that Ray gave you the solution that he did because this is exactly what you asked for in the original post.But that will join me the first row of table 1 with the first row of table 2 and second row of table 1 with second row of table 2.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 150
- Joined: Tue Mar 13, 2007 1:17 am
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You either want Cartesian products, as one reading shows, or you don't, as another reading shows. But there is no way to "avoid" Cartesian product if what you need is the result of a Cartesian product.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.