2 Joins better than 1
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 644
- Joined: Sat Aug 26, 2006 3:59 pm
- Location: Mclean, VA
2 Joins better than 1
I had to left-outer join 3 source flat files which were all pre-sorted on the joining keys. I developed Job A where I first join source 1 and 2 and then join its result with source 3. I also developed Job B where I join source 1, 2 and 3 using a single join stage. Job A runs twice as fast as Job B. Does anyone know why is that so? I am just trying to understand the inner workings of join stage. Thanks for reading...
Attitude is everything....
add a sort stage before the join to each link where you specify the sort key and add "Don't sort, data previously sorted" so that the join stage does not have to insert a sort operator.
You can print out and compare the score with APT_DUMP_SCORE.
You could also add APT_NO_SORT_INSERTION so that no "hidden" tsort operators are inserted into your stream.
You can print out and compare the score with APT_DUMP_SCORE.
You could also add APT_NO_SORT_INSERTION so that no "hidden" tsort operators are inserted into your stream.
-
- Premium Member
- Posts: 644
- Joined: Sat Aug 26, 2006 3:59 pm
- Location: Mclean, VA
How do I get the APT_DUMP_SCORE and add the APT_NO_SORT_INSERTION?ArndW wrote:add a sort stage before the join to each link where you specify the sort key and add "Don't sort, data previously sorted" so that the join stage does not have to insert a sort operator.
You can print out and compare the score with APT_DUMP_SCORE.
You could also add APT_NO_SORT_INSERTION so that no "hidden" tsort operators are inserted into your stream.
Attitude is everything....
-
- Premium Member
- Posts: 45
- Joined: Fri Nov 07, 2008 12:22 pm