recoverability from failures

aartlett · Post by **aartlett** » Tue Sep 27, 2005 12:20 am

Kool C,
But a well designed, tested and documented system will not fail in production.

Seriously, I have found (in way too many years) that most support doco is about as useful as most design doco after the system has been modified ... none at all as it isn't kept up to date.

I would create an issues log, track errors, and if the same problem occurs two or three times, fix it so it NEVER happen again and close it off.

You can't predict what will happen unless there are outside forces.

Well one thing you can predict, at least once the files/external system providing you data won't be there when you need it and you need to have a way to handle it.

Good luck.

ray.wurlod · Post by **ray.wurlod** » Tue Sep 27, 2005 1:14 am

The issues log should be busiest in the "dev -> test -> back to dev" cycle. That way you should get ETL as bulletproof as possible into the production environment. Of course, whoever does the testing needs to be sufficiently astute, cynical and painstaking.

Handling of failures in the ETL itself - for example database failures (cannot extend tablespace, for example) - should be predicted, and a restart capability designed in to the ETL processes. Job sequences have this so some extent but you also need to allow for such things as staging areas so that you can reliably produce the run or part of it, and for recording progress of the load. You would probably have designed jobs that can pick up from somewhere other than row number 1.

vmcburney · Post by **vmcburney** » Tue Sep 27, 2005 6:29 am

It can also include the basics, how to tell if the server is running, how to stop and start the server, how to view error logs (if you don't have error messages delivered by email), how to restart a failed job etc.

aartlett · Post by **aartlett** » Tue Sep 27, 2005 6:57 am

Those are great generic support tips, for almost any production system.

Maybe we should start a FAQ, or a Wikki page on how to support?

kool_cons · Post by **kool_cons** » Wed Sep 28, 2005 11:54 pm

thanks guys