Page 1 of 1

Join at Extract or later

Posted: Sun Jun 17, 2007 10:11 am
by DSDeveloper
Hello,

I am a newbie in DSS and BI world, have been reading various DW books to improve.

I have a general question regarding the dimensions. OLTP system has a highly normalized structure while DW structures are denormalized.

Using DataStage - When we pull the information from the source system (for the dimensions), the information comes from multiple tables.

Which way is better
1) While pulling the information join the multiple tables and put the information in staging area(RIGHT AT THE SOURCE)
Or
2) Bring the information separately(multiple tables) and then after the information is in staging area then do the joins.

Thanks a million for your time.

Regards
Eric

Posted: Sun Jun 17, 2007 3:13 pm
by ray.wurlod
Welcome aboard.

Where possible, join at extract. There are two main reasons.

First, the join may be able to be assisted by indexes and so be achieved faster.

Second, the result of the join will contain fewer rows than the unjoined contents of the source tables; this is even more important if these rows must be transmitted over a network. Fewer rows will be processed faster.