Page 1 of 1

Best way of Deleting Partially loaded Data

Posted: Tue Mar 06, 2007 10:06 am
by Pavan_Yelugula
Hi All
What is the best way of deleting partially loaded data?
My scenario goes like this i have four jobs which i am calling in a Basic control job. Each of the Jobs takes a file and loads in to a table. If there is a failure on any one of the Jobs i want to delete the data which is loaded in all the four tables in that run. i have the time stamp as a column in the tables to identify the records for that run.
Right now i am thinking of writing another job which will take the time stamp and delete the data in the tables. is there a much cleaner way of doing this more gracefully. Some thing like doing the commit on the database only after the fourth job and rolling back if there is a failure.

Any ideas will be really helpful

Thanks
Pavan

Posted: Tue Mar 06, 2007 10:12 am
by DSguru2B
Have you tried with transaction size 0. You will have to test it out with four database stages in a single job. If any one of them fails an insert, it will generate a warning, keep your warning level at 1 so that the job is forced to abort. This way the transaction will be rolled back. The only thing that you need to test is, if the first three links finish process, the fourth link is still running and encounters an abort, will it rollback the other three tables as well.

Posted: Tue Mar 06, 2007 10:26 am
by Pavan_Yelugula
Hi DSguru
If i understand it right you want my job design to look like follows
one Job with the following design and with the transcation size and warning logic.
-----------------------
File1----->Table1
File2----->Table2
File3----->Table3
File4----->Table4
-----------------------

Right now i have these four steps in four different jobs. :(
This opens an other question which i faced.

Last time when i designed jobs as above with combining four different jobs in to a single job which don't have any relation between them. i was told the job design is not good and it is not stable. Is that right? Is there some graceful way of deleting if i have four different jobs...

Thanks
Pavan

Posted: Tue Mar 06, 2007 10:28 am
by DSguru2B
Actually I presumed that you have all in one job. You are correct, its better to keep them seperate. But if you have a requirement of rolling back data from all three tables then I would say go for a single job. But you need to test it first. Test whatever I said in my first post.
If you have four different jobs then you will have to explicitly pass a delete command. There is no way you can relate the four jobs.
What database is it?

Re: Best way of Deleting Partially loaded Data

Posted: Tue Mar 06, 2007 10:37 am
by gateleys
I don't know if your design of deleting multi tables because of some error in one job is any good. However, I am not going to delve into it. The solution that you are looking for could be -

* Let each job Job1 thru' Job4 contain 'before SQL' statements to delete the respective tables based on TimeStamp job parameters.
* Let MainSequence contain a subsequence 'SubSeq1' that contains Job1 ---- Job4.

* If Any(Job1, Job2, Job3, Job4) Fails
Let Exception Handler take Control and the Terminator abort the SubSeq1.

* In the MainSequence, have 2 triggers from the SubSeq1 -
OKLink - Trigger is SubSeq1.$JobStatus = 1 OR SubSeq1.$JobStatus = 2
FailLink - Trigger is 'Otherwise'

* Let the FailLink loop back to the start of SubSeq1, so that it fires the jobs again (deleting the just inserted rows based on Timestamp job parameters).


NOTE: Having said that, I would rather test my jobs not to fail due to physical rejects from the database.

gateleys

Posted: Tue Mar 06, 2007 10:38 am
by Pavan_Yelugula
It is an Oracle database.
Thanks Guru I will try what you suggested.
In between why is the design wrong or unstable if we have four independent links running in a job. I don't get the logic why it shouldn't be done. Even if we have a sequencer to run these four jobs. The four Job activity stages will be sitting on the canvass parallely. How is that good?? And how is that soo different from the earlier design.

Thanks
Pavan

Posted: Tue Mar 06, 2007 10:44 am
by DSguru2B
Basically for restartablility. Generally, the loads are indepenedent. A failure in one load will not affect the other. Thats why its a best practice to run them in a sperate job so that if one fails, atleast the other wont be affected.
In your case its the opposite, you want the failure in one to affect the rest. So thats why I suggested to do it in a single job.
The design is the outcome of the requirement. Your requirement calls for it.

Posted: Tue Mar 06, 2007 10:53 am
by shawn_ramsey
We have a load ID column in our tables that we use for audit and validation processes. We generate an ID (that is logged in a control table) and all jobs in a sequence will tag new rows with the load ID. That way we can use the id for validation counts, sums, etc as well as bulk deletions on a failed load.

Posted: Tue Mar 06, 2007 10:55 am
by DSguru2B
The load id/batch id is used in many sites for the same reason. But even that would require a specific DELETE FROM TABLE WHERE LOAD ID = <some number>

Posted: Tue Mar 06, 2007 10:57 am
by Pavan_Yelugula
Hi Guru
Thanks a lot for the Info
Gateleys
Thanks a lot and i like the idea. it is cool and clean. it is easy to implement as well. It would be really helpful if you can just clarify me on the last part.

Code: Select all

Let the FailLink loop back to the start of SubSeq1, so that it fires the jobs again (deleting the just inserted rows based on Timestamp job parameters). 
If any job fails then the failure link in the main Seq will start the subseq1 again. The Before Job subroutine will clean up the partially loaded data and the job starts again right . The failed job before subroutine will clean the data for that table and will start the job again and it will fail...Where is the loop breaking up. it is like a infinte loop right. it keeps going on and on.


Thanks
Pavan

Posted: Tue Mar 06, 2007 12:19 pm
by gateleys
Pavan_Yelugula wrote:If any job fails then the failure link in the main Seq will start the subseq1 again. The Before Job subroutine will clean up the partially loaded data and the job starts again right . The failed job before subroutine will clean the data for that table and will start the job again and it will fail...Where is the loop breaking up. it is like a infinte loop right. it keeps going on and on.

Thanks
Pavan
Pavan... point noted. Let the Failure link invoke the Terminator to abort the MainSequence. Now, after the Support personnel fix the job(s) and run the MainSequence again, the before SQL will kick in to delete the concerned tables. Also, since the problem is fixed, the OKLink trigger will fire in the SubSeq1.

gateleys

Posted: Tue Mar 06, 2007 12:44 pm
by Pavan_Yelugula
Thanks for the Reply gateley....

The time stamp applied in the tables is the current system date and time stamp when the basic job control runs. With the said solution i guess i need to hold these failed time stamps somewhere to delete the data at a later point of time as i am not doing the load and delete(if fails) in a single sweep.

I will work around the idea you have given and will try to use it for my scenario.

Thanks a lot gateley


Thanks
Pavan

Posted: Tue Mar 06, 2007 1:34 pm
by gateleys
Pavan_Yelugula wrote: I will work around the idea you have given and will try to use it for my scenario.

Thanks a lot gateley
Thanks
Pavan
You are most welcome. Now, you may want to mark your post as 'Resolved' or 'Workaround'.

gateleys