join stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

join stage

Post by gbusson »

Hi all,

What is the impact of the order of the input links in a join stage when you do an outer join (in term of performance)?

Should the "biggest" link be the left or right link, or is the performance equal , regardless the configuration?

Thanks for your help.
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

I don't think performance is the issue, rather the result set will differ.

You should choose the 'correct' option of Left, Right, Full Outer or Inner based on your requirements and ensure that you have the inputs set correctly on the Link Ordering tab.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

No such comparison is officially give atlest for join stage.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Re: join stage

Post by sud »

Hi,

A join requires all the input to be pre-sorted or to be of manageable size, this is the advice from ascential. However whenever you have performance issues with a lookup stage use a join.

Now, for a join stage there are implications of using an outer join simply because for all records whose match is not found a full table scan is on the cards. But if the business logic requires an outer join - that has to be done. There are no dirctives from ascential regarding any link preference - so, to answer your question it does not matter which link you use as left and which as right.

Sudarshan
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

to be more precise,

i wonder why there are 2 options (left and right outer join) as you can change the order of the links easily.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Have you found the reason why it is available in SQL?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

gbusson wrote:to be more precise,

i wonder why there are 2 options (left and right outer join) as you can change the order of the links easily.
In case of say a left outer join - there has to be a mechanism to specify which link you want to be considered as left.

:)
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

i know that!

My question is : is there a best pratice to know which link is to be considered as the left one?

I worked with Informatica also and performance was impacted depending of which link was the left one.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

IF you perform a lookup, yes there will be certainly diference in performance. The one used for reference should be less in number when compared to the main steram. But not in join stage as for as I know.
May I know what is the case of Informatica?
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
sud
Premium Member
Premium Member
Posts: 366
Joined: Fri Dec 02, 2005 5:00 am
Location: Here I Am

Post by sud »

gbusson wrote:i know that!

My question is : is there a best pratice to know which link is to be considered as the left one?

I worked with Informatica also and performance was impacted depending of which link was the left one.
My friend you are correct -- the concept in Informatica is that the primary link will be cached and hence the developer has to be sure that it will fit into memory. This is applicable in the case of Lookup stage wherein the primary link should fit in memory. For a join stage however there are no such directives known so far.

:-)
It took me fifteen years to discover I had no talent for ETL, but I couldn't give it up because by that time I was too famous.
thompsonp
Premium Member
Premium Member
Posts: 205
Joined: Tue Mar 01, 2005 8:41 am

Post by thompsonp »

The guideline for DataStage is that the lookup data to a lookup stage should fit in memory.

All the lookup data is loaded into memory before any of the primary data can be processed. It has to work like that unless the lookup data were partitioned and sorted on the lookup keys (which would be join).

The advantage of using a lookup over a join is that the primary data need not be partitioned and sorted on the lookup keys. If the primary dataset is large then not having to repartition and sort it can be a big time saving.

If you have been on the Ascential training course take a look at the slide that shows the Join, Lookup and Merge stages in a table along with their erquirements for inputs, outputs, partitioning, sorting and reject options. That is always a good starting point if you don't know which one to use.

And before anyone asks, sorry no I can't post a copy - it's copyrighted
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

thank you all...

The join is inevitable :

flow : 300 000 rows

ref : 1 000 000 rows

moreover, this is NOT the only project on the server, i'd like to let some RAM avalaible.
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

other question :

do you know how to append data to a lookup fileset??
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

There is no option to append. But you can read the data from lookup fileset and funnel it with the additional dataset and load it back to the same lookup fileset.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
gbusson
Participant
Posts: 98
Joined: Fri Oct 07, 2005 2:50 am
Location: France
Contact:

Post by gbusson »

the Lookup fileset is a fileset.

Can we acces to it via the fileset stage (there is a append option)?
Post Reply