Restarting a Oracle Bulk Load where it left off...

vdr123 · Post by **vdr123** » Wed Jun 09, 2004 10:44 am

We are planning to use Oracle Bulk Load stage to do a bulk load(not manual load)...
What we intend to do is to re-start the bulk from where it left-off...
Eg:
Say we have 100,000 rows to load
commit freqency = 10,000
While bulk load, it committed 5 times (50,000) and failed on the 6th time commit...

Is there a way I can get the number of rows bulk loaded(count of # of records) from a oracle log file or from a datastage log file...

Does oracle-bulk load OR datastage engine writes any logs with the bulk-load progress, so that I can use that log to get the number of commits and restart from where the oracle-bulk loader failed...
Hope I am clear...please let me know...Any feedback is appreciated.

Thank You

kcbland · Post by **kcbland** » Wed Jun 09, 2004 10:53 am

There is NO capability within DataStage to do what you want.

What you CAN do is control the bulkload operation yourself using intelligent scripting. Oracle's sqlldr generates a log file that you can parse to get the last commit point. Then, you can run the sqlldr load with a modify control file instruction set.

For example, if your load method is truncate on the restart you want to use append. Otherwise, you just append. You can give sqlldr a skip number of rows value and it will read the load file and skip until that row count is met. For normal loads this is set to 0, for restarts you would set it to the row count you desire.

You can do a sqlldr command to get the options, also checkout http://asktom.oracle.com for help.

vdr123 · Post by **vdr123** » Wed Jun 09, 2004 11:03 am

We are using DS to do this...
Job Flow is something like this
HashFile, Transform, SeqFile, OraBulk(DB)

In the Transform we use a parameter(parmStartRow) on the constraint (link), which say only pick records from Hash where @INROWNUMBER>parmStartRow

So initialy parmStartRow=0,
When the job fails say after 5th commit...it will GREP for the last commit number of records from the bulk-log file and pass it as parameter to this job
so now,
parmStartRow=(whatever the number from grep output from bulk-log)

So, we need to know if ORABULK or oracle writes any log file while doing bulk load with DS

kcbland · Post by **kcbland** » Wed Jun 09, 2004 11:11 am

Well, first of all, hash files are UNORDERED. You should NEVER use @INROWNUM as a means of reading a hash file and expecting it to return consistently ordered data unless you're using a UV/ODBC stage to give you a SQL interface and use an ORDER BY clause.

Second, how are you invoking sqlldr? You need to deal with the truncate/append as I mentioned. Your method of regenerating a load file each time you run in a recovery mode is problematic, if not for the above mentioned reason but also from a design viewpoint. You incur all the overhead of DS re-processing the data into the stage, which is highly un-performant and unnecessary.

Since all you really need to do to recover is reinvoke the bulkload script giving it a simple number for the skip value, which can be parsed from the sqlldr log, your easiest solution is that route. sqlldr does generate screenoutput and a log file. Just search the help or asktom and it's all there. I'm recommending a solution that I use, not something I just thought up.

ray.wurlod · Post by **ray.wurlod** » Wed Jun 09, 2004 5:36 pm

A DataStage job prepares the data and control files for sqlldr. Once that has been completed, the DataStage job has essentially finished all it has to do - you may or may not choose to invoke sqlldr but in no sense other than being a child process is sqlldr under control of DataStage.

Therefore, what you seek to do is not something reasonably to be expected to be a DataStage capability.

That said, a sufficiently skilled DataStage practitioner (in particular, DataStage BASIC programmer) could create the kind of subsidiary job Ken suggested, to parse the sqlldr log file, figure out the skip value, re-generate the script file with this information in it, and re-invoke sqlldr. Such a mechanism would also need to handle the case that the sqlldr session finished successfully and there is no further loading to take place.

The better approach, of course, would be to involve your Oracle DBA to ensure that adequate space is available so that the bulk loads never fail.