Page 1 of 1

Steps to take Fact load

Posted: Wed Oct 06, 2010 5:02 am
by Phani01
Can anyone guide me the steps to take to load fact table.
Performance should be good and data load should be error free.

AND

Suppose i want to load 100million records and Job failed after 50 million records.Is there any way to load the rest of the rows(Remaining 50 million) instead of all from scratch.

Any help can be appreciated.

Posted: Wed Oct 06, 2010 5:57 am
by ArndW
1. "Performance should be good". This topic is far too complex to discuss in a single thread. It depends upon so many different factors ranging from performance expectations (e.g. I have a window of 3 hours to load my data and it must finish within that time) to database SQL optimization with literally hundreds of other factors to put into the equation.

2. "data load should be error free". That has nothing to do with steps to take but is a matter of job design and implementation and mainly of very good planning.

3. "Suppose i want to load 100million records and Job failed after 50 million...". Yes, you can code this several different ways. How you implement this restartability is heavily dependant upon what sort of writes you are doing ("inserts" only vs. "inserts" and "updates"" to a table, the former being easier and the latter more complex to implement).

Posted: Wed Oct 06, 2010 6:24 am
by chulett
4. "steps to take to load fact table". This part is actually the easiest part. Stream your core data in - which is typically whatever measures the fact contains and the associated business keys - do lookups against all of the fact's dimension tables to retrieve their surrogates and then insert the fact record. Wallah! :wink:

Posted: Wed Oct 06, 2010 11:23 am
by Phani01
Thanks guys.

While inserting into fact table, do we drop the indexes and remove any relationships between dimensions and fact for better performance?

Posted: Wed Oct 06, 2010 1:02 pm
by chulett
Depends, maybe yes maybe no. Usually one would try both ways to check.