recoverability from failures

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
kool_cons
Participant
Posts: 68
Joined: Thu Jul 07, 2005 3:41 pm

recoverability from failures

Post by kool_cons »

hai
Last edited by kool_cons on Thu Jan 05, 2006 12:43 pm, edited 1 time in total.
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

Kool C,
But a well designed, tested and documented system will not fail in production.

Seriously, I have found (in way too many years) that most support doco is about as useful as most design doco after the system has been modified ... none at all as it isn't kept up to date.

I would create an issues log, track errors, and if the same problem occurs two or three times, fix it so it NEVER happen again and close it off.

You can't predict what will happen unless there are outside forces.

Well one thing you can predict, at least once the files/external system providing you data won't be there when you need it and you need to have a way to handle it.

Good luck.
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The issues log should be busiest in the "dev -> test -> back to dev" cycle. That way you should get ETL as bulletproof as possible into the production environment. Of course, whoever does the testing needs to be sufficiently astute, cynical and painstaking.

Handling of failures in the ETL itself - for example database failures (cannot extend tablespace, for example) - should be predicted, and a restart capability designed in to the ETL processes. Job sequences have this so some extent but you also need to allow for such things as staging areas so that you can reliably produce the run or part of it, and for recording progress of the load. You would probably have designed jobs that can pick up from somewhere other than row number 1.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

It can also include the basics, how to tell if the server is running, how to stop and start the server, how to view error logs (if you don't have error messages delivered by email), how to restart a failed job etc.
aartlett
Charter Member
Charter Member
Posts: 152
Joined: Fri Apr 23, 2004 6:44 pm
Location: Australia

Post by aartlett »

Those are great generic support tips, for almost any production system.

Maybe we should start a FAQ, or a Wikki page on how to support?
Andrew

Think outside the Datastage you work in.

There is no True Way, but there are true ways.
kool_cons
Participant
Posts: 68
Joined: Thu Jul 07, 2005 3:41 pm

Post by kool_cons »

thanks guys
Post Reply