One large sequence or separate sequences?

RodBarnes · Post by **RodBarnes** » Wed Jun 15, 2005 2:10 pm

I am sure this has been asked before but I know we all enjoy sharing our opinions.

We have a complete ETL sequence and now have the case where an additional dimension and fact table is needed. These new tables are related to the existing tables by using shared dimensions. So....

I can think of reasons why to do put these new jobs in a separate sequence; e.g., modularity and minimizing cross-talk between engineers when working on the modules.

I can also think of reasons why they should end up in the same sequence; e.g., triggers between jobs, timing of the jobs not having to wait, etc.

What is the best practices here?

kduke · Post by **kduke** » Wed Jun 15, 2005 6:44 pm

I say separate small sequences with only dependent jobs in one sequences. Say you have customer snowfalked off of order then order would need to follow customer then I would put these in the same sequence to control the jobs.

RodBarnes · Post by **RodBarnes** » Thu Jun 16, 2005 9:20 am

Thanks for your input. After discussing this some, we've concluded that we're going to organize our project so the shared dimensions will be built in one sequence, with individual sequences for each of the additional dimensions and fact tables. We'll probably have a single master sequence that will orchestrate everything (so we'll have the benefit of using the triggers between sequences) and it will handle the task management and logging.

My original question mostly came from consideration for rollback and error recovery, and how we could best manage that. But we've decided probably the safest and most accurate is just to rely on backups.

I'd still like to hear from others who've more experience than I with ETL.

kduke wrote:I say separate small sequences with only dependent jobs in one sequences.