join,merge, lookup

tostay2003 · Post by **tostay2003** » Thu Mar 09, 2006 11:07 am

how to decide which to use and when. The document says use join when the physical memory is less or else use lookup. merge is used when all the inputs are datasets. My doubt is

a) how to assess practically that the physical memory is less to take join stage.
b) when to use merge? shouldnt we use join or lookup when the inputs are datasets?

-Beginner to DS

DSguru2B · Post by **DSguru2B** » Thu Mar 09, 2006 11:09 am

You gotta try it to know it!!
I've experienced it and know the manual is right :D

tostay2003 · Post by **tostay2003** » Thu Mar 09, 2006 11:24 am

:D nah i too agree that the manual is right, say a given condition, how do i measure the physical memory?

what about the merge stage? should it be used when v r using datasets and should others not be used in this context?

DSguru2B · Post by **DSguru2B** » Thu Mar 09, 2006 11:49 am

Merge can be used ideally for anything .. A seq file, a Dataset, etc...
Anything with data can be used to merge. Although I have not tried everything. But the manual would say which ones are not compatible with merge stage.
Join and Lookup stage are classically used to make joins as it says by name.

tostay2003 · Post by **tostay2003** » Thu Mar 09, 2006 12:05 pm

merge as well acts to perform join (correct me if i am wrong). I got from the manual when to use join and lookup. But if merge can be used ideally, lets say most of the sources. Then which to prefer?

Thanks for response & and ur patience

ray.wurlod · Post by **ray.wurlod** » Thu Mar 09, 2006 12:06 pm

There are various other criteria. The decision should really depend on what functionality you need - the memory savings of the Join and Merge stages are countered by the need to sort upstream (which eats most of the saved memory!).

Lookup stage gives you conditional lookup. It allows condition failures and lookup failures to be directed to a rejects link, but there is only one rejects link.

Join stage performs a join. Like SQL join, there is no concept of a reject and therefore the Join stage does not support a rejects link.

Do you need to preserve the key columns from the secondary inputs, or is it OK that they be consumed? That, too, would affect your choice of stage type.

Merge stage lets you separately capture the failed "lookups" from multiple update Data Sets. This might best suit your requirements.

tostay2003 · Post by **tostay2003** » Fri Mar 10, 2006 7:56 am

thank you

kumar_s · Post by **kumar_s** » Sat Mar 11, 2006 3:08 am

Some generic difference among the three.

Code: Select all

                        Join            Lookup                 Merge

Description            RDBMS-style         Source and lookup     Master table and one
                       Relational tables   table in RAM          or more update tables

Number and names       2 or more inputs    1 source and N lookup 1 master tables and N
Of inputs                                  tables                update tables

Memory usage          Light                Heavy                 Light

Handling of           Ok, produces a       OK                    Warning given 
duplicates in         Cross-product                              Duplicates will be an 
primary input                                                    unmatched primary.

Handling of           Ok, produces a       Warning given. The    Ok only when N =1 
duplicates in         Cross-product        second lookup table
secondary input                            entry is ignored.

Options on 
Unmatched             None                 None                  Capture in reject sets.
secondary

On match, secondary   reusable             reusable             reusable
entriesare

Number of outputs     1                    1 output and         1 output and 1 reject for
                                           optionaly 1 reject   each update table


Captured in reject    Does not apply       Unmatched primary       Unmatched Secondary entries
Set                                        entries

kumar_s · Post by **kumar_s** » Sat Mar 11, 2006 3:10 am

Hmm... the add by the side take up some space, which cause the alignment to go off.