Avoid Cartesian Products

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Avoid Cartesian Products

Post by hitmanthesilentassasin »

Hi,

I have a requirement where i have to map 1:1 relationship between 2 tables. below is the example of the data

Table1
--------------------------
Col1 || Col2 || Col3
1 || A || 0608
1 || B || 0708
1 || C || 0808

Table 2
--------------------------
Col1 || Col2 || Col3
1 || A || 0615
1 || B || 0715
1 || C || 0915

From the above mentioned table I want the data to be joined using Col1 only, But, first row of the first table should be joined with the first row of the second table and second row of the first table should be joined with the second row of the second table. The solution I could think of is using a join stage that will give me the cartesian product and then filtering the data. Since, the volume of the data is way too high I wanted to avoid the cartesian prodcut. Any comments or suggestions would be appreciated.

Thanks,
Waseem
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Put a Column Generator stage on each input to the Join stage, and use it to generate an integer sequence. Allow appropriately for partitioning if running in parallel. Make sure you get the same sequence on each input. Then use that as your join key.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Post by hitmanthesilentassasin »

But that will join me the first row of table 1 with the first row of table 2 and second row of table 2 with second row of table 2. Look at the below example for more details:

Table1
--------------------------
Col1 || Col2 || Col3
1 || A || 0608
1 || B || 0708
1 || C || 0808


Table 2
--------------------------
Col1 || Col2 || Col3
1 || A || 0615
1 || C || 0915
1 || B || 0715

In the above example the sequence of the second row is changed but the second row of the table 1 should be matched with the third row of the table 2.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:? You can only know that if you join on the two key columns not one. Otherwise it will need to be purely positional.
-craig

"You can never have too many knives" -- Logan Nine Fingers
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Post by hitmanthesilentassasin »

It can neither be positional nor on both the keys. One way is to join on the first key and then filter out the data but by doing so i would have the cartesian product created and then the data would be filtered out. So, I want to avoid the cartesian product.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You "avoid the cartesian product" by joining on the proper keys. Why in the world can you not simply do that here? :?

If you "can't do that" for some reason, I really don't see how you can avoid the cartesian-then-filter approach. However, you could arrange for all of that to happen in the source sql, I would imagine.
But that will join me the first row of table 1 with the first row of table 2 and second row of table 1 with second row of table 2.
I also wanted to point out that Ray gave you the solution that he did because this is exactly what you asked for in the original post.
-craig

"You can never have too many knives" -- Logan Nine Fingers
hitmanthesilentassasin
Participant
Posts: 150
Joined: Tue Mar 13, 2007 1:17 am

Post by hitmanthesilentassasin »

Thanks for the replies. It seems I am not sure in my first post.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You either want Cartesian products, as one reading shows, or you don't, as another reading shows. But there is no way to "avoid" Cartesian product if what you need is the result of a Cartesian product.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply