Job Performance
Moderators: chulett, rschirm, roy
Job Performance
Hi ,
I've a job which process 9 million rows and takes 12 hours to complete.
Whnever the job fails due to any reason we have to start the job again from the begining. I'm storing the data in to staging area but its of no use suppose the job breaks after 5 million rows. There is no option rather then start the process from begining.
I'm looking for option/ design for job where I can start the job from where it was broken. i.e if it broke at 5 million then I should be able to process the job from the point it last processing the record.
Thanks in Advance!
Regards,
Munish
I've a job which process 9 million rows and takes 12 hours to complete.
Whnever the job fails due to any reason we have to start the job again from the begining. I'm storing the data in to staging area but its of no use suppose the job breaks after 5 million rows. There is no option rather then start the process from begining.
I'm looking for option/ design for job where I can start the job from where it was broken. i.e if it broke at 5 million then I should be able to process the job from the point it last processing the record.
Thanks in Advance!
Regards,
Munish
Use multiple job instances and divide the source data into N groups. Use N instances to process the data in 1/N less time, where ideally N is the number of cpus available. Remove all database reference lookups, using hash files. If a single instance fails, re-run just that instance.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It's conceptually easy to design - you have to keep track of how far you have successfully processed. You also have to design so as to be able to process beginning from a given point.
Quite a few techniques exist. It's easiest if your overall design includes staging areas after extraction and before loading. That way, you can do the transformation part just once. And I recommend sequential files wherever possible for staging; they are the fastest.
Quite a few techniques exist. It's easiest if your overall design includes staging areas after extraction and before loading. That way, you can do the transformation part just once. And I recommend sequential files wherever possible for staging; they are the fastest.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Charter Member
- Posts: 199
- Joined: Tue Jan 18, 2005 2:50 am
- Location: India
Well I have done bit of R&D on this, and the fastest way i found was, the logic is only for Fact load, it can be modified and can be adopted for dimension: In JobA
I read from a sequential file, in the transformer stage i do error check, data validation, data massaging stuff and then instead of loading into Oracle table, I introduced a staging area (a sequential file).
seq ---> transformer ---> Stgseq file
In JobB
After completion of JobA, I used Sqlldr with direct=True, and used StgSeqfile and loaded into the Oracle table.
in Sqlldr invoking routine, I added some more logic.
1. I always created logs and bad file with timestamp. Secondly before running JobA I used to clear these log files.
2. Before running the Sqlldr, a logic used to loop through the log files and read records_loaded+records rejected for the same file and was used in SKIP option.
This way I had control on restarting of job midway.
I tested the logic on my laptop for 720000 records, total time taken for the load was 20min. approx 17 mins for JobA and 2.40 min for JobB
I read from a sequential file, in the transformer stage i do error check, data validation, data massaging stuff and then instead of loading into Oracle table, I introduced a staging area (a sequential file).
seq ---> transformer ---> Stgseq file
In JobB
After completion of JobA, I used Sqlldr with direct=True, and used StgSeqfile and loaded into the Oracle table.
in Sqlldr invoking routine, I added some more logic.
1. I always created logs and bad file with timestamp. Secondly before running JobA I used to clear these log files.
2. Before running the Sqlldr, a logic used to loop through the log files and read records_loaded+records rejected for the same file and was used in SKIP option.
This way I had control on restarting of job midway.
I tested the logic on my laptop for 720000 records, total time taken for the load was 20min. approx 17 mins for JobA and 2.40 min for JobB
Shantanu Choudhary
-
- Charter Member
- Posts: 199
- Joined: Tue Jan 18, 2005 2:50 am
- Location: India
In the Job Sequence.
JobA ---->JobB(actually a Routine). here is the routine if You need it.
RestartabilitySqlldr(ORACLE_HOME,UserName,DataBaseName,UserPassword,ControlFilePath,ControlFileName,DataFilePath,DataFileName,LogFilePath,LogFileName,BadFilePath,BadFileName,Parallel,SkipUnusableIndexes,SkipIndexMaintenence,DirectLoad,CommitPoint,ErrorLimit) <---parameter passed to routine
-----------------------------------------------------------------------------------
RoutineName='RestartabilitySqlldr'
deffun ExecSqlldr(A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,A16,A17,A18,A19) CALLING "DSU.ExecSqlldr"
Ans=''
Skip=0
Call DSLogInfo('Starting Routine ':RoutineName,TimeDate())
VarTotalRead=0
Var=0
i=1
Call DSLogInfo(RoutineName:' -Reading Through The Log File To Get The SKIP Value : ',TimeDate())
Command='dir /b ':LogFilePath:LogFileName:'*.log'
Call DSExecute("DOS", Command,Output, SystemReturnCode)
If Trim(Output) = 'File Not Found' Then
Skip=0
End Else
loop
if Trim(Output<i>) <> '' Then
OpenSeq LogFilePath:Output<i> To FileVar Then
***************Declare Local Variable******************
RowsLoaded=0
RowsRead=0
RowsRejected=0
RowsDiscarded=0
*******************************************************
Loop
ReadSeq OneLine From FileVar Then
if Trim(Oconv(OneLine,'MC/N')) = 'Rows successfully loaded.' Then
RowsLoaded=Oconv(OneLine,'MCN')
End
if Trim(Oconv(OneLine,'MC/N')) = 'Total logical records rejected:' Then
RowsRejected=Oconv(OneLine,'MCN')
End
if Trim(Oconv(OneLine,'MC/N')) = 'Total logical records discarded:' Then
RowsDiscarded=Oconv(OneLine,'MCN')
End
End Else
VarTotalRead=Var+RowsDiscarded+RowsRejected+RowsLoaded
Var=VarTotalRead
Goto ExitPoint
End
Repeat
End Else
call DSLogFatal(RoutineName:'Unable To open The Log File ',TimeDate())
Ans=-1
End
ExitPoint:
CloseSeq FileVar
End Else
Goto FinalExitPoint
End
i=i+1
Repeat
FinalExitPoint:
Skip=VarTotalRead
End
Call DSLogInfo(RoutineName:' -':Skip:' Records Will Be Skipped :',TimeDate())
TimeStamp=Oconv(TimeDate()[' ',1,1]:Oconv(Iconv(TimeDate()[' ',2,1],"D"),"DMDY[2,2,4]"),"MCN")
Call DSLogInfo('Calling Routine ExecSqlldr -':TimeDate(),RoutineName)
Output=ExecSqlldr(ORACLE_HOME,UserName,DataBaseName,UserPassword,ControlFilePath,ControlFileName,DataFilePath,DataFileName,LogFilePath,LogFileName:'_':TimeStamp,BadFilePath,BadFileName:'_':TimeStamp,Parallel,SkipUnusableIndexes,SkipIndexMaintenence,DirectLoad,CommitPoint,ErrorLimit,Skip)
Call DSLogInfo('Finished Routine ExecSqlldr - ':TimeDate(),RoutineName)
If Output = 0 Then
call DSLogInfo('Completed ': RoutineName:' Successfully - ',TimeDate())
end else
call DSLogFatal(RoutineName :' Generated Error ':' Return Code :-':Output,TimeDate())
End
-------------------------------------------------------------------------------
above function uses another routine "ExecSqlldr". below is the code for the same.
ExecSqlldr(ORACLE_HOME,UserName,DataBaseName,UserPassword,ControlFilePath,ControlFileName,DataFilePath,DataFileName,LogFilePath,LogFileName,BadFilePath,BadFileName,Parallel,SkipUnusableIndexes,SkipIndexMaintenence,DirectLoad,CommitPoint,ErrorLimit,Skip)
-----------------------------------------------------------------------
RoutineName='ExecSqlldr'
Command=ORACLE_HOME:'/bin/sqlldr USERID=':UserName:'/':UserPassword:'@':DataBaseName:' CONTROL=':ControlFilePath:ControlFileName:' DATA=':DataFilePath:DataFileName:' LOG=':LogFilePath:LogFileName:'.log':' BAD=':BadFilePath:BadFileName:'.bad':' PARALLEL=':Parallel:' SKIP_UNUSABLE_INDEXES=':SkipUnusableIndexes:' SKIP_INDEX_MAINTENANCE=':SkipIndexMaintenence:' DIRECT=':DirectLoad:' ROWS=':CommitPoint:' ERRORS=':ErrorLimit:' SKIP=':Skip:' SILENT=HEADER,FEEDBACK'
Call DSLogInfo(RoutineName:'-Starting Sqlldr':TimeDate(),RoutineName)
Call DSExecute("DOS", Command,Output, SystemReturnCode)
If SystemReturnCode = 0 then
Ans='Sqlldr Successfully Completed'
End Else
Ans=SystemReturnCode
Call DSLogInfo('Sqlldr Did Not Complete Successfully, Return Code= ': SystemReturnCode,RoutineName)
Call DSLogFatal( ' ERROR: ':Output,RoutineName)
End
JobA ---->JobB(actually a Routine). here is the routine if You need it.
RestartabilitySqlldr(ORACLE_HOME,UserName,DataBaseName,UserPassword,ControlFilePath,ControlFileName,DataFilePath,DataFileName,LogFilePath,LogFileName,BadFilePath,BadFileName,Parallel,SkipUnusableIndexes,SkipIndexMaintenence,DirectLoad,CommitPoint,ErrorLimit) <---parameter passed to routine
-----------------------------------------------------------------------------------
RoutineName='RestartabilitySqlldr'
deffun ExecSqlldr(A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,A15,A16,A17,A18,A19) CALLING "DSU.ExecSqlldr"
Ans=''
Skip=0
Call DSLogInfo('Starting Routine ':RoutineName,TimeDate())
VarTotalRead=0
Var=0
i=1
Call DSLogInfo(RoutineName:' -Reading Through The Log File To Get The SKIP Value : ',TimeDate())
Command='dir /b ':LogFilePath:LogFileName:'*.log'
Call DSExecute("DOS", Command,Output, SystemReturnCode)
If Trim(Output) = 'File Not Found' Then
Skip=0
End Else
loop
if Trim(Output<i>) <> '' Then
OpenSeq LogFilePath:Output<i> To FileVar Then
***************Declare Local Variable******************
RowsLoaded=0
RowsRead=0
RowsRejected=0
RowsDiscarded=0
*******************************************************
Loop
ReadSeq OneLine From FileVar Then
if Trim(Oconv(OneLine,'MC/N')) = 'Rows successfully loaded.' Then
RowsLoaded=Oconv(OneLine,'MCN')
End
if Trim(Oconv(OneLine,'MC/N')) = 'Total logical records rejected:' Then
RowsRejected=Oconv(OneLine,'MCN')
End
if Trim(Oconv(OneLine,'MC/N')) = 'Total logical records discarded:' Then
RowsDiscarded=Oconv(OneLine,'MCN')
End
End Else
VarTotalRead=Var+RowsDiscarded+RowsRejected+RowsLoaded
Var=VarTotalRead
Goto ExitPoint
End
Repeat
End Else
call DSLogFatal(RoutineName:'Unable To open The Log File ',TimeDate())
Ans=-1
End
ExitPoint:
CloseSeq FileVar
End Else
Goto FinalExitPoint
End
i=i+1
Repeat
FinalExitPoint:
Skip=VarTotalRead
End
Call DSLogInfo(RoutineName:' -':Skip:' Records Will Be Skipped :',TimeDate())
TimeStamp=Oconv(TimeDate()[' ',1,1]:Oconv(Iconv(TimeDate()[' ',2,1],"D"),"DMDY[2,2,4]"),"MCN")
Call DSLogInfo('Calling Routine ExecSqlldr -':TimeDate(),RoutineName)
Output=ExecSqlldr(ORACLE_HOME,UserName,DataBaseName,UserPassword,ControlFilePath,ControlFileName,DataFilePath,DataFileName,LogFilePath,LogFileName:'_':TimeStamp,BadFilePath,BadFileName:'_':TimeStamp,Parallel,SkipUnusableIndexes,SkipIndexMaintenence,DirectLoad,CommitPoint,ErrorLimit,Skip)
Call DSLogInfo('Finished Routine ExecSqlldr - ':TimeDate(),RoutineName)
If Output = 0 Then
call DSLogInfo('Completed ': RoutineName:' Successfully - ',TimeDate())
end else
call DSLogFatal(RoutineName :' Generated Error ':' Return Code :-':Output,TimeDate())
End
-------------------------------------------------------------------------------
above function uses another routine "ExecSqlldr". below is the code for the same.
ExecSqlldr(ORACLE_HOME,UserName,DataBaseName,UserPassword,ControlFilePath,ControlFileName,DataFilePath,DataFileName,LogFilePath,LogFileName,BadFilePath,BadFileName,Parallel,SkipUnusableIndexes,SkipIndexMaintenence,DirectLoad,CommitPoint,ErrorLimit,Skip)
-----------------------------------------------------------------------
RoutineName='ExecSqlldr'
Command=ORACLE_HOME:'/bin/sqlldr USERID=':UserName:'/':UserPassword:'@':DataBaseName:' CONTROL=':ControlFilePath:ControlFileName:' DATA=':DataFilePath:DataFileName:' LOG=':LogFilePath:LogFileName:'.log':' BAD=':BadFilePath:BadFileName:'.bad':' PARALLEL=':Parallel:' SKIP_UNUSABLE_INDEXES=':SkipUnusableIndexes:' SKIP_INDEX_MAINTENANCE=':SkipIndexMaintenence:' DIRECT=':DirectLoad:' ROWS=':CommitPoint:' ERRORS=':ErrorLimit:' SKIP=':Skip:' SILENT=HEADER,FEEDBACK'
Call DSLogInfo(RoutineName:'-Starting Sqlldr':TimeDate(),RoutineName)
Call DSExecute("DOS", Command,Output, SystemReturnCode)
If SystemReturnCode = 0 then
Ans='Sqlldr Successfully Completed'
End Else
Ans=SystemReturnCode
Call DSLogInfo('Sqlldr Did Not Complete Successfully, Return Code= ': SystemReturnCode,RoutineName)
Call DSLogFatal( ' ERROR: ':Output,RoutineName)
End
Shantanu Choudhary
-
- Charter Member
- Posts: 199
- Joined: Tue Jan 18, 2005 2:50 am
- Location: India
A small correction to first code... I dont know how that happened, in some places there is reference to index "Output<i>", after pasting it in the thread I can see only Output.
[img]
Replace::
if Trim(Output) <> '' Then
OpenSeq LogFilePath:Output To FileVar Then
With:
if Trim(Output<i>) <> '' Then
OpenSeq LogFilePath:Output<i> To FileVar Then
[/img][/list][/code]
![Sad :(](./images/smilies/icon_sad.gif)
[img]
Replace::
if Trim(Output) <> '' Then
OpenSeq LogFilePath:Output To FileVar Then
With:
if Trim(Output<i>) <> '' Then
OpenSeq LogFilePath:Output<i> To FileVar Then
[/img][/list][/code]
Shantanu Choudhary
-
- Charter Member
- Posts: 199
- Joined: Tue Jan 18, 2005 2:50 am
- Location: India
its again lost/vanished
, its testing my patience...
![Evil or Very Mad :evil:](./images/smilies/icon_evil.gif)
Code: Select all
Replace::
if Trim(Output) <> '' Then
OpenSeq LogFilePath:Output To FileVar Then
With:
if Trim(Output<i>) <> '' Then
OpenSeq LogFilePath:Output<i> To FileVar Then
Shantanu Choudhary
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom