Restarting a Oracle Bulk Load where it left off...

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vdr123
Participant
Posts: 65
Joined: Fri Nov 14, 2003 9:23 am

Restarting a Oracle Bulk Load where it left off...

Post by vdr123 »

We are planning to use Oracle Bulk Load stage to do a bulk load(not manual load)...
What we intend to do is to re-start the bulk from where it left-off...
Eg:
Say we have 100,000 rows to load
commit freqency = 10,000
While bulk load, it committed 5 times (50,000) and failed on the 6th time commit...

Is there a way I can get the number of rows bulk loaded(count of # of records) from a oracle log file or from a datastage log file...

Does oracle-bulk load OR datastage engine writes any logs with the bulk-load progress, so that I can use that log to get the number of commits and restart from where the oracle-bulk loader failed...

Hope I am clear...please let me know...Any feedback is appreciated.

Thank You
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

There is NO capability within DataStage to do what you want.

What you CAN do is control the bulkload operation yourself using intelligent scripting. Oracle's sqlldr generates a log file that you can parse to get the last commit point. Then, you can run the sqlldr load with a modify control file instruction set.

For example, if your load method is truncate on the restart you want to use append. Otherwise, you just append. You can give sqlldr a skip number of rows value and it will read the load file and skip until that row count is met. For normal loads this is set to 0, for restarts you would set it to the row count you desire.

You can do a sqlldr command to get the options, also checkout http://asktom.oracle.com for help.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
vdr123
Participant
Posts: 65
Joined: Fri Nov 14, 2003 9:23 am

Post by vdr123 »

We are using DS to do this...
Job Flow is something like this
HashFile, Transform, SeqFile, OraBulk(DB)

In the Transform we use a parameter(parmStartRow) on the constraint (link), which say only pick records from Hash where @INROWNUMBER>parmStartRow

So initialy parmStartRow=0,
When the job fails say after 5th commit...it will GREP for the last commit number of records from the bulk-log file and pass it as parameter to this job
so now,
parmStartRow=(whatever the number from grep output from bulk-log)

So, we need to know if ORABULK or oracle writes any log file while doing bulk load with DS
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Well, first of all, hash files are UNORDERED. You should NEVER use @INROWNUM as a means of reading a hash file and expecting it to return consistently ordered data unless you're using a UV/ODBC stage to give you a SQL interface and use an ORDER BY clause.

Second, how are you invoking sqlldr? You need to deal with the truncate/append as I mentioned. Your method of regenerating a load file each time you run in a recovery mode is problematic, if not for the above mentioned reason but also from a design viewpoint. You incur all the overhead of DS re-processing the data into the stage, which is highly un-performant and unnecessary.

Since all you really need to do to recover is reinvoke the bulkload script giving it a simple number for the skip value, which can be parsed from the sqlldr log, your easiest solution is that route. sqlldr does generate screenoutput and a log file. Just search the help or asktom and it's all there. I'm recommending a solution that I use, not something I just thought up.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A DataStage job prepares the data and control files for sqlldr. Once that has been completed, the DataStage job has essentially finished all it has to do - you may or may not choose to invoke sqlldr but in no sense other than being a child process is sqlldr under control of DataStage.

Therefore, what you seek to do is not something reasonably to be expected to be a DataStage capability.

That said, a sufficiently skilled DataStage practitioner (in particular, DataStage BASIC programmer) could create the kind of subsidiary job Ken suggested, to parse the sqlldr log file, figure out the skip value, re-generate the script file with this information in it, and re-invoke sqlldr. Such a mechanism would also need to handle the case that the sqlldr session finished successfully and there is no further loading to take place.

The better approach, of course, would be to involve your Oracle DBA to ensure that adequate space is available so that the bulk loads never fail.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply