Multiple Join stages with Same Join Key

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Multiple Join stages with Same Join Key

Post by SwathiCh »

Hi All,

I need to join more than 5 tables from different databases on same key. My current job design is - using three join stages to join all these tables in same job.

Is it suggestible to have one join stage in design to join all tables at once or joining two tables with join stage and remaining two tables with other join stage by keeping the same partition?

If I use three join stages to join all these tables, I think I wont overload scratch space (As it is parallel, it will free up the scratch space as data is moving to next stage). In this design, off course it will create more processes but I feel it give more performance.

Please share your ideas/suggestions on this design.

Thanks..
--
Swathi Ch
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Join stage does support more than two inputs (all must have the same join key). Execution is pairwise, as it is in most database servers.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SwathiCh
Premium Member
Premium Member
Posts: 64
Joined: Mon Feb 08, 2010 7:17 pm

Post by SwathiCh »

Thank you Ray.

That means adding multiple join stages won't give any performance improvement in this case, right?
--
Swathi Ch
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Hard to tell. More stages may result in more processes, which may help throughput (assuming operator combination to be disabled and a server that's not overloaded).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vamsi.4a6
Participant
Posts: 334
Joined: Sun Jan 22, 2012 7:06 am
Contact:

Post by vamsi.4a6 »

Let us say i have one join stage with four inputs (Table1,Table2,Table3,Table4).Which two tables will be joined first and how datastage will decide since u mentioned Execution is pairwise.
Thanks and Regards
Vamsi krishna.v
http://datastage-vamsi.blogspot.in/
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That's driven by link execution order. Left with first intermediate -> result1, then result1 with second intermediate -> result2, then result2 with Right.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply