join stage

gbusson · Post by **gbusson** » Thu Jun 08, 2006 3:08 am

Hi all,

What is the impact of the order of the input links in a join stage when you do an outer join (in term of performance)?

Should the "biggest" link be the left or right link, or is the performance equal , regardless the configuration?

Thanks for your help.

thompsonp · Post by **thompsonp** » Thu Jun 08, 2006 5:51 am

I don't think performance is the issue, rather the result set will differ.

You should choose the 'correct' option of Left, Right, Full Outer or Inner based on your requirements and ensure that you have the inputs set correctly on the Link Ordering tab.

kumar_s · Post by **kumar_s** » Thu Jun 08, 2006 8:12 am

No such comparison is officially give atlest for join stage.

sud · Post by **sud** » Thu Jun 08, 2006 8:16 am

Hi,

A join requires all the input to be pre-sorted or to be of manageable size, this is the advice from ascential. However whenever you have performance issues with a lookup stage use a join.

Now, for a join stage there are implications of using an outer join simply because for all records whose match is not found a full table scan is on the cards. But if the business logic requires an outer join - that has to be done. There are no dirctives from ascential regarding any link preference - so, to answer your question it does not matter which link you use as left and which as right.

Sudarshan

gbusson · Post by **gbusson** » Thu Jun 08, 2006 8:17 am

to be more precise,

i wonder why there are 2 options (left and right outer join) as you can change the order of the links easily.

kumar_s · Post by **kumar_s** » Thu Jun 08, 2006 8:20 am

Have you found the reason why it is available in SQL?

sud · Post by **sud** » Thu Jun 08, 2006 8:20 am

gbusson wrote:to be more precise,

i wonder why there are 2 options (left and right outer join) as you can change the order of the links easily.

In case of say a left outer join - there has to be a mechanism to specify which link you want to be considered as left.

gbusson · Post by **gbusson** » Thu Jun 08, 2006 8:24 am

i know that!

My question is : is there a best pratice to know which link is to be considered as the left one?

I worked with Informatica also and performance was impacted depending of which link was the left one.

kumar_s · Post by **kumar_s** » Thu Jun 08, 2006 8:28 am

IF you perform a lookup, yes there will be certainly diference in performance. The one used for reference should be less in number when compared to the main steram. But not in join stage as for as I know.
May I know what is the case of Informatica?

sud · Post by **sud** » Thu Jun 08, 2006 8:30 am

gbusson wrote:i know that!

My question is : is there a best pratice to know which link is to be considered as the left one?

I worked with Informatica also and performance was impacted depending of which link was the left one.

My friend you are correct -- the concept in Informatica is that the primary link will be cached and hence the developer has to be sure that it will fit into memory. This is applicable in the case of Lookup stage wherein the primary link should fit in memory. For a join stage however there are no such directives known so far.

thompsonp · Post by **thompsonp** » Fri Jun 09, 2006 6:10 am

The guideline for DataStage is that the lookup data to a lookup stage should fit in memory.

All the lookup data is loaded into memory before any of the primary data can be processed. It has to work like that unless the lookup data were partitioned and sorted on the lookup keys (which would be join).

The advantage of using a lookup over a join is that the primary data need not be partitioned and sorted on the lookup keys. If the primary dataset is large then not having to repartition and sort it can be a big time saving.

If you have been on the Ascential training course take a look at the slide that shows the Join, Lookup and Merge stages in a table along with their erquirements for inputs, outputs, partitioning, sorting and reject options. That is always a good starting point if you don't know which one to use.

And before anyone asks, sorry no I can't post a copy - it's copyrighted

gbusson · Post by **gbusson** » Fri Jun 09, 2006 6:16 am

thank you all...

The join is inevitable :

flow : 300 000 rows

ref : 1 000 000 rows

moreover, this is NOT the only project on the server, i'd like to let some RAM avalaible.

gbusson · Post by **gbusson** » Fri Jun 09, 2006 6:18 am

other question :

do you know how to append data to a lookup fileset??

kumar_s · Post by **kumar_s** » Fri Jun 09, 2006 8:29 am

There is no option to append. But you can read the data from lookup fileset and funnel it with the additional dataset and load it back to the same lookup fileset.

gbusson · Post by **gbusson** » Fri Jun 09, 2006 9:12 am

the Lookup fileset is a fileset.

Can we acces to it via the fileset stage (there is a append option)?

DSXchange

join stage

join stage

Re: join stage