join,merge, lookup

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

join,merge, lookup

Post by tostay2003 »

how to decide which to use and when. The document says use join when the physical memory is less or else use lookup. merge is used when all the inputs are datasets. My doubt is

a) how to assess practically that the physical memory is less to take join stage.
b) when to use merge? shouldnt we use join or lookup when the inputs are datasets?

-Beginner to DS
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

You gotta try it to know it!!
I've experienced it and know the manual is right :D
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

Post by tostay2003 »

:D nah i too agree that the manual is right, say a given condition, how do i measure the physical memory?

what about the merge stage? should it be used when v r using datasets and should others not be used in this context?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Merge can be used ideally for anything .. A seq file, a Dataset, etc...
Anything with data can be used to merge. Although I have not tried everything. But the manual would say which ones are not compatible with merge stage.
Join and Lookup stage are classically used to make joins as it says by name.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

Post by tostay2003 »

merge as well acts to perform join (correct me if i am wrong). I got from the manual when to use join and lookup. But if merge can be used ideally, lets say most of the sources. Then which to prefer?

Thanks for response & and ur patience
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There are various other criteria. The decision should really depend on what functionality you need - the memory savings of the Join and Merge stages are countered by the need to sort upstream (which eats most of the saved memory!).

Lookup stage gives you conditional lookup. It allows condition failures and lookup failures to be directed to a rejects link, but there is only one rejects link.

Join stage performs a join. Like SQL join, there is no concept of a reject and therefore the Join stage does not support a rejects link.

Do you need to preserve the key columns from the secondary inputs, or is it OK that they be consumed? That, too, would affect your choice of stage type.

Merge stage lets you separately capture the failed "lookups" from multiple update Data Sets. This might best suit your requirements.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tostay2003
Participant
Posts: 97
Joined: Tue Feb 21, 2006 6:45 am

Post by tostay2003 »

thank you
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Some generic difference among the three.

Code: Select all

                        Join            Lookup                 Merge

Description            RDBMS-style         Source and lookup     Master table and one
                       Relational tables   table in RAM          or more update tables

Number and names       2 or more inputs    1 source and N lookup 1 master tables and N
Of inputs                                  tables                update tables

Memory usage          Light                Heavy                 Light

Handling of           Ok, produces a       OK                    Warning given 
duplicates in         Cross-product                              Duplicates will be an 
primary input                                                    unmatched primary.

Handling of           Ok, produces a       Warning given. The    Ok only when N =1 
duplicates in         Cross-product        second lookup table
secondary input                            entry is ignored.

Options on 
Unmatched             None                 None                  Capture in reject sets.
secondary

On match, secondary   reusable             reusable             reusable
entriesare

Number of outputs     1                    1 output and         1 output and 1 reject for
                                           optionaly 1 reject   each update table


Captured in reject    Does not apply       Unmatched primary       Unmatched Secondary entries
Set                                        entries
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hmm... the add by the side take up some space, which cause the alignment to go off.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
Post Reply