Merge or Join, Which is more efficient

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Merge or Join, Which is more efficient

Post by Munish »

Hi All,
My job can be done by
either Merge or Join.

I was just wondering which one is more efficient, considering the DataSet may have around 300 mills rows.

Thanks,
Munish
MK
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Define "efficient".

They use very similar algorithms for memory management.

Do you need just a join, or do you need to be able to capture, separately, rows for which there was no "update" available? The decision is largely driven by functionality.

Why not perform some benchmarks and document your results here?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Munish
Participant
Posts: 89
Joined: Sun Nov 19, 2006 10:34 pm

Post by Munish »

Sure,
Would definitely do that.
It is one of things to do in our SVT.

In current job,
Source 1: Key values + sum
Source 2 : Key valued + count

join

output: Keyvalues + Sum + Count.

Thus, it is very simple inner join.

We have started the development with JOIN stage, however we are going to compare it with Merge stage once we have real time 300 mill data.

Thanks Ray
Munish
MK
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If source1 and source2 are in the same database, also consider doing it in the SQL, where the query may benefit from the table's indexes and/or statistics.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Once I made similar benchmarking, Join gave be better result. But I dont have any of those details right now. But I will still wait for your results to be published, Munish.
Make sure your CPU usage idle for all the cases. Do also measure the memory and CPU usage during the operation.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
vijayrc
Participant
Posts: 197
Joined: Sun Apr 02, 2006 10:31 am
Location: NJ

Post by vijayrc »

kumar_s wrote:Once I made similar benchmarking, Join gave be better result. But I dont have any of those details right now. But I will still wait for your results to be published, Munish.
Make sure your CPU usage idle for all the cases. Do also measure the memory and CPU usage during the operation.
Any Updates on this Munish. I'm curious too, as I am also debating on the usage of MERGE over JOIN in certain scenarios in our development. Your results would help me. Thanks
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Search for it. Vincent also has the comparisons in his blog. Click on the signature of vmcburney.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply