loading strategy

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Jay
Participant
Posts: 105
Joined: Tue Nov 11, 2003 8:28 pm

loading strategy

Post by Jay »

hi all,

we are designing our load strategy. one case is that during loading of say 1 million rows, job fails after n rows have loaded. now what do we do on the production side. how do we handle the loading from the
(n +1)th row.

my thought process is to first commit on the target database then we do an update/insert from the input file(with 1 million rows).

or is there a way to find out till what row loading has been done and then continue from there.

thanks
jay
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

While there really isn't a way to 'find out' where an aborted load left off, you *can* build this logic into your process. You'll need a mechanism to track where you are in the load. You could leverage one of the internal counters like @INROWNUM. Save it somewhere, say in conjunction with a commit to the database every X rows, to a hash or even a sequential file. Build in something to clear it (or set it to zero) when the job completes normally.

Then all you would need is a mechanism to see if this information exists when the job starts, if it does it would mean you need to restart. Pass in the starting row number as a parameter and skip (via a constraint) all of the rows up to that point.

This is all at a somewhat high level, the details of the implementation are up to you. Also keep in mind the fact that your loads must be "repeatable", so that when you restart it, N+1 still equals N+1, if you know what I mean. Typically, this means landing the data first.

Hope this helps,
-craig

"You can never have too many knives" -- Logan Nine Fingers
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
IMHO, I prefer the staging area aproach.
if that went well no need for "to prod" step to fail unless a real problem occured like a machine crash etc'....
then again you still might need to know what was loaded till the crash.

that is only my HO.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Jay
Participant
Posts: 105
Joined: Tue Nov 11, 2003 8:28 pm

thanks

Post by Jay »

thank you, all, for the ideas
Post Reply