Difference Between join, Lookup and merge

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
anilkumark802
Participant
Posts: 1
Joined: Wed Oct 13, 2010 10:30 am

Difference Between join, Lookup and merge

Post by anilkumark802 »

Hi I have some problem with join,lookup and merge stages. I am giving my requirement below.

I have two seq files each of which are having one column with name ROWNUM, and in both the file data is 3 records like below
100
100
100

if i use join stage to join these two files and performing innerjoin and left outerjoin, for both operations it is giving out put 9 records like.
100
100
100
100
100
100
100
100
100

if i use Lookup stage and given options as drop, continue. for both the operations it is giving out put 3 records.
100
100
100

and if use merge stage and given option as drop then it is giving 3 records as output like
100
100
100

and if i use option as keep then it is giving 5 records as output like.
100
100
100
100
100

I think all the 3 stages should produce same result why the 3 stages join, lookup and merge are giving different results.
Can any one help me please.
AnilKumar K
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

How many nodes are you running these jobs on, are they using the same configuration file and on which column are you partitioning your data?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Never mind the fact that, because of the nature of your data, the join is giving you a cartesian product.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

For starters, Merge consumes rows from its reference (update) input(s). The other two stage types don't. Your Keep rule iss giving variations because you end up with unmatched masters after all the reference rows have been consumed.

So what do you actually want the output to be? Both results you got are correct, because the specifications were different.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
swapnilverma
Participant
Posts: 135
Joined: Tue Aug 14, 2007 4:27 am
Location: Mumbai

Post by swapnilverma »

to simplify you are joining , merging , performing lookup with two identical files..... or two same files ...

if u use two different files you will see some more magic ..


if they produce same results then why we had three diff stages...
in some cases they may produce same results ..

Join -> will give you cartesian product( set of matching recs ). as ur file is identical its gives you 3*3 = 9 rows.

Merge-> reference record is consumed ( once a matching ip found )
typically used to populate master -child records ... and when you need non matching records from reference ( as it allows a reject link )


Lookup -> no cartesian product ... to populate a column based on matching keys
Thanks
Swapnil

"Whenever you find whole world against you just turn around and Lead the world"
arunpramanik
Participant
Posts: 63
Joined: Fri Jun 22, 2007 7:27 am
Location: Kolkata

Post by arunpramanik »

Best thing will to add non key fields in both of your source, and get the output containing key and non-key fields from these three stage.

Hope that will make you understand how these three stage works
Umbix0562
Participant
Posts: 4
Joined: Tue Nov 30, 2010 3:49 am

Post by Umbix0562 »

anilkumark802 wrote:Hi I have some problem with join,lookup and merge stages. I am giving my requirement below.

I have two seq files each of which are having one column with name ROWNUM, and in both the file data is 3 records like below
100
100
100

if i use join stage to join these two files and performing innerjoin and left outerjoin, for both operations it is giving out put 9 records like.
100
100
100
100
100
100
100
100
100

if i use Lookup stage and given options as drop, continue. for both the operations it is giving out put 3 records.7
100
100
100

and if use merge stage and given option as drop then it is giving 3 records as output like
100
100
100

and if i use option as keep then it is giving 5 records as output like.
100
100
100
100
100

I think all the 3 stages should produce same result why the 3 stages join, lookup and merge are giving different results.
Can any one help me please.
-------------

Good morning my dear friend

I think that it's only a logical problem and I try to explain it by an example. We have two tables with tree rows in the first and three rows in the second one.

Table One Table Two
Column A Column B
100 100
100 100
100 100

I develop the following SQL query

SELECT COLUMN A FROM One, Two WHERE A = B

For each existing row in Table One the query macth three rows satisfating the predicate A=B. So for the first row in table one match three rows in table two. This is the reason explaining
an output with nine rows. The Join Stage works in the same way (this is true both for the inner and left join).

The lookup stage instead works differently. It reads n rows and produce exactly the same number of row in output (if they exist in the reference link).

About the tirth Stage, the Merge one, I don't know it very well but I suppose that in the first case it distincts (aggregates) the rows and not in the second.

I hope that my explanation are clear.

Have a nice day

Umberto
Post Reply