join stage
Moderators: chulett, rschirm, roy
join stage
Hi all,
What is the impact of the order of the input links in a join stage when you do an outer join (in term of performance)?
Should the "biggest" link be the left or right link, or is the performance equal , regardless the configuration?
Thanks for your help.
What is the impact of the order of the input links in a join stage when you do an outer join (in term of performance)?
Should the "biggest" link be the left or right link, or is the performance equal , regardless the configuration?
Thanks for your help.
Re: join stage
Hi,
A join requires all the input to be pre-sorted or to be of manageable size, this is the advice from ascential. However whenever you have performance issues with a lookup stage use a join.
Now, for a join stage there are implications of using an outer join simply because for all records whose match is not found a full table scan is on the cards. But if the business logic requires an outer join - that has to be done. There are no dirctives from ascential regarding any link preference - so, to answer your question it does not matter which link you use as left and which as right.
Sudarshan
A join requires all the input to be pre-sorted or to be of manageable size, this is the advice from ascential. However whenever you have performance issues with a lookup stage use a join.
Now, for a join stage there are implications of using an outer join simply because for all records whose match is not found a full table scan is on the cards. But if the business logic requires an outer join - that has to be done. There are no dirctives from ascential regarding any link preference - so, to answer your question it does not matter which link you use as left and which as right.
Sudarshan
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
In case of say a left outer join - there has to be a mechanism to specify which link you want to be considered as left.gbusson wrote:to be more precise,
i wonder why there are 2 options (left and right outer join) as you can change the order of the links easily.
![Smile :)](./images/smilies/icon_smile.gif)
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
IF you perform a lookup, yes there will be certainly diference in performance. The one used for reference should be less in number when compared to the main steram. But not in join stage as for as I know.
May I know what is the case of Informatica?
May I know what is the case of Informatica?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
My friend you are correct -- the concept in Informatica is that the primary link will be cached and hence the developer has to be sure that it will fit into memory. This is applicable in the case of Lookup stage wherein the primary link should fit in memory. For a join stage however there are no such directives known so far.gbusson wrote:i know that!
My question is : is there a best pratice to know which link is to be considered as the left one?
I worked with Informatica also and performance was impacted depending of which link was the left one.
![Smile :-)](./images/smilies/icon_smile.gif)
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
The guideline for DataStage is that the lookup data to a lookup stage should fit in memory.
All the lookup data is loaded into memory before any of the primary data can be processed. It has to work like that unless the lookup data were partitioned and sorted on the lookup keys (which would be join).
The advantage of using a lookup over a join is that the primary data need not be partitioned and sorted on the lookup keys. If the primary dataset is large then not having to repartition and sort it can be a big time saving.
If you have been on the Ascential training course take a look at the slide that shows the Join, Lookup and Merge stages in a table along with their erquirements for inputs, outputs, partitioning, sorting and reject options. That is always a good starting point if you don't know which one to use.
And before anyone asks, sorry no I can't post a copy - it's copyrighted
All the lookup data is loaded into memory before any of the primary data can be processed. It has to work like that unless the lookup data were partitioned and sorted on the lookup keys (which would be join).
The advantage of using a lookup over a join is that the primary data need not be partitioned and sorted on the lookup keys. If the primary dataset is large then not having to repartition and sort it can be a big time saving.
If you have been on the Ascential training course take a look at the slide that shows the Join, Lookup and Merge stages in a table along with their erquirements for inputs, outputs, partitioning, sorting and reject options. That is always a good starting point if you don't know which one to use.
And before anyone asks, sorry no I can't post a copy - it's copyrighted