recoverability from failures
Moderators: chulett, rschirm, roy
recoverability from failures
hai
Last edited by kool_cons on Thu Jan 05, 2006 12:43 pm, edited 1 time in total.
Kool C,
But a well designed, tested and documented system will not fail in production.
Seriously, I have found (in way too many years) that most support doco is about as useful as most design doco after the system has been modified ... none at all as it isn't kept up to date.
I would create an issues log, track errors, and if the same problem occurs two or three times, fix it so it NEVER happen again and close it off.
You can't predict what will happen unless there are outside forces.
Well one thing you can predict, at least once the files/external system providing you data won't be there when you need it and you need to have a way to handle it.
Good luck.
But a well designed, tested and documented system will not fail in production.
Seriously, I have found (in way too many years) that most support doco is about as useful as most design doco after the system has been modified ... none at all as it isn't kept up to date.
I would create an issues log, track errors, and if the same problem occurs two or three times, fix it so it NEVER happen again and close it off.
You can't predict what will happen unless there are outside forces.
Well one thing you can predict, at least once the files/external system providing you data won't be there when you need it and you need to have a way to handle it.
Good luck.
Andrew
Think outside the Datastage you work in.
There is no True Way, but there are true ways.
Think outside the Datastage you work in.
There is no True Way, but there are true ways.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The issues log should be busiest in the "dev -> test -> back to dev" cycle. That way you should get ETL as bulletproof as possible into the production environment. Of course, whoever does the testing needs to be sufficiently astute, cynical and painstaking.
Handling of failures in the ETL itself - for example database failures (cannot extend tablespace, for example) - should be predicted, and a restart capability designed in to the ETL processes. Job sequences have this so some extent but you also need to allow for such things as staging areas so that you can reliably produce the run or part of it, and for recording progress of the load. You would probably have designed jobs that can pick up from somewhere other than row number 1.
Handling of failures in the ETL itself - for example database failures (cannot extend tablespace, for example) - should be predicted, and a restart capability designed in to the ETL processes. Job sequences have this so some extent but you also need to allow for such things as staging areas so that you can reliably produce the run or part of it, and for recording progress of the load. You would probably have designed jobs that can pick up from somewhere other than row number 1.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
It can also include the basics, how to tell if the server is running, how to stop and start the server, how to view error logs (if you don't have error messages delivered by email), how to restart a failed job etc.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn