Page 1 of 1

Multiple Join stages with Same Join Key

Posted: Wed Dec 11, 2013 9:36 am
by SwathiCh
Hi All,

I need to join more than 5 tables from different databases on same key. My current job design is - using three join stages to join all these tables in same job.

Is it suggestible to have one join stage in design to join all tables at once or joining two tables with join stage and remaining two tables with other join stage by keeping the same partition?

If I use three join stages to join all these tables, I think I wont overload scratch space (As it is parallel, it will free up the scratch space as data is moving to next stage). In this design, off course it will create more processes but I feel it give more performance.

Please share your ideas/suggestions on this design.

Thanks..

Posted: Wed Dec 11, 2013 4:16 pm
by ray.wurlod
Join stage does support more than two inputs (all must have the same join key). Execution is pairwise, as it is in most database servers.

Posted: Wed Dec 11, 2013 6:57 pm
by SwathiCh
Thank you Ray.

That means adding multiple join stages won't give any performance improvement in this case, right?

Posted: Thu Dec 12, 2013 12:10 am
by ray.wurlod
Hard to tell. More stages may result in more processes, which may help throughput (assuming operator combination to be disabled and a server that's not overloaded).

Posted: Thu Dec 12, 2013 7:00 pm
by vamsi.4a6
Let us say i have one join stage with four inputs (Table1,Table2,Table3,Table4).Which two tables will be joined first and how datastage will decide since u mentioned Execution is pairwise.

Posted: Thu Dec 12, 2013 10:18 pm
by ray.wurlod
That's driven by link execution order. Left with first intermediate -> result1, then result1 with second intermediate -> result2, then result2 with Right.