Page 1 of 1
join,merge, lookup
Posted: Thu Mar 09, 2006 11:07 am
by tostay2003
how to decide which to use and when. The document says use join when the physical memory is less or else use lookup. merge is used when all the inputs are datasets. My doubt is
a) how to assess practically that the physical memory is less to take join stage.
b) when to use merge? shouldnt we use join or lookup when the inputs are datasets?
-Beginner to DS
Posted: Thu Mar 09, 2006 11:09 am
by DSguru2B
You gotta try it to know it!!
I've experienced it and know the manual is right :D
Posted: Thu Mar 09, 2006 11:24 am
by tostay2003
:D nah i too agree that the manual is right, say a given condition, how do i measure the physical memory?
what about the merge stage? should it be used when v r using datasets and should others not be used in this context?
Posted: Thu Mar 09, 2006 11:49 am
by DSguru2B
Merge can be used ideally for anything .. A seq file, a Dataset, etc...
Anything with data can be used to merge. Although I have not tried everything. But the manual would say which ones are not compatible with merge stage.
Join and Lookup stage are classically used to make joins as it says by name.
Posted: Thu Mar 09, 2006 12:05 pm
by tostay2003
merge as well acts to perform join (correct me if i am wrong). I got from the manual when to use join and lookup. But if merge can be used ideally, lets say most of the sources. Then which to prefer?
Thanks for response & and ur patience
Posted: Thu Mar 09, 2006 12:06 pm
by ray.wurlod
There are various other criteria. The decision should really depend on what functionality you need - the memory savings of the Join and Merge stages are countered by the need to sort upstream (which eats most of the saved memory!).
Lookup stage gives you conditional lookup. It allows condition failures and lookup failures to be directed to a rejects link, but there is only one rejects link.
Join stage performs a join. Like SQL join, there is no concept of a reject and therefore the Join stage does not support a rejects link.
Do you need to preserve the key columns from the secondary inputs, or is it OK that they be consumed? That, too, would affect your choice of stage type.
Merge stage lets you separately capture the failed "lookups" from multiple update Data Sets. This might best suit your requirements.
Posted: Fri Mar 10, 2006 7:56 am
by tostay2003
thank you
Posted: Sat Mar 11, 2006 3:08 am
by kumar_s
Some generic difference among the three.
Code: Select all
Join Lookup Merge
Description RDBMS-style Source and lookup Master table and one
Relational tables table in RAM or more update tables
Number and names 2 or more inputs 1 source and N lookup 1 master tables and N
Of inputs tables update tables
Memory usage Light Heavy Light
Handling of Ok, produces a OK Warning given
duplicates in Cross-product Duplicates will be an
primary input unmatched primary.
Handling of Ok, produces a Warning given. The Ok only when N =1
duplicates in Cross-product second lookup table
secondary input entry is ignored.
Options on
Unmatched None None Capture in reject sets.
secondary
On match, secondary reusable reusable reusable
entriesare
Number of outputs 1 1 output and 1 output and 1 reject for
optionaly 1 reject each update table
Captured in reject Does not apply Unmatched primary Unmatched Secondary entries
Set entries
Posted: Sat Mar 11, 2006 3:10 am
by kumar_s
Hmm... the add by the side take up some space, which cause the alignment to go off.