Best way of Deleting Partially loaded Data

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Pavan_Yelugula
Premium Member
Premium Member
Posts: 133
Joined: Tue Nov 23, 2004 11:24 pm
Location: India

Best way of Deleting Partially loaded Data

Post by Pavan_Yelugula »

Hi All
What is the best way of deleting partially loaded data?
My scenario goes like this i have four jobs which i am calling in a Basic control job. Each of the Jobs takes a file and loads in to a table. If there is a failure on any one of the Jobs i want to delete the data which is loaded in all the four tables in that run. i have the time stamp as a column in the tables to identify the records for that run.
Right now i am thinking of writing another job which will take the time stamp and delete the data in the tables. is there a much cleaner way of doing this more gracefully. Some thing like doing the commit on the database only after the fourth job and rolling back if there is a failure.

Any ideas will be really helpful

Thanks
Pavan
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Have you tried with transaction size 0. You will have to test it out with four database stages in a single job. If any one of them fails an insert, it will generate a warning, keep your warning level at 1 so that the job is forced to abort. This way the transaction will be rolled back. The only thing that you need to test is, if the first three links finish process, the fourth link is still running and encounters an abort, will it rollback the other three tables as well.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Pavan_Yelugula
Premium Member
Premium Member
Posts: 133
Joined: Tue Nov 23, 2004 11:24 pm
Location: India

Post by Pavan_Yelugula »

Hi DSguru
If i understand it right you want my job design to look like follows
one Job with the following design and with the transcation size and warning logic.
-----------------------
File1----->Table1
File2----->Table2
File3----->Table3
File4----->Table4
-----------------------

Right now i have these four steps in four different jobs. :(
This opens an other question which i faced.

Last time when i designed jobs as above with combining four different jobs in to a single job which don't have any relation between them. i was told the job design is not good and it is not stable. Is that right? Is there some graceful way of deleting if i have four different jobs...

Thanks
Pavan
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Actually I presumed that you have all in one job. You are correct, its better to keep them seperate. But if you have a requirement of rolling back data from all three tables then I would say go for a single job. But you need to test it first. Test whatever I said in my first post.
If you have four different jobs then you will have to explicitly pass a delete command. There is no way you can relate the four jobs.
What database is it?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Re: Best way of Deleting Partially loaded Data

Post by gateleys »

I don't know if your design of deleting multi tables because of some error in one job is any good. However, I am not going to delve into it. The solution that you are looking for could be -

* Let each job Job1 thru' Job4 contain 'before SQL' statements to delete the respective tables based on TimeStamp job parameters.
* Let MainSequence contain a subsequence 'SubSeq1' that contains Job1 ---- Job4.

* If Any(Job1, Job2, Job3, Job4) Fails
Let Exception Handler take Control and the Terminator abort the SubSeq1.

* In the MainSequence, have 2 triggers from the SubSeq1 -
OKLink - Trigger is SubSeq1.$JobStatus = 1 OR SubSeq1.$JobStatus = 2
FailLink - Trigger is 'Otherwise'

* Let the FailLink loop back to the start of SubSeq1, so that it fires the jobs again (deleting the just inserted rows based on Timestamp job parameters).


NOTE: Having said that, I would rather test my jobs not to fail due to physical rejects from the database.

gateleys
Last edited by gateleys on Tue Mar 06, 2007 10:39 am, edited 1 time in total.
Pavan_Yelugula
Premium Member
Premium Member
Posts: 133
Joined: Tue Nov 23, 2004 11:24 pm
Location: India

Post by Pavan_Yelugula »

It is an Oracle database.
Thanks Guru I will try what you suggested.
In between why is the design wrong or unstable if we have four independent links running in a job. I don't get the logic why it shouldn't be done. Even if we have a sequencer to run these four jobs. The four Job activity stages will be sitting on the canvass parallely. How is that good?? And how is that soo different from the earlier design.

Thanks
Pavan
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Basically for restartablility. Generally, the loads are indepenedent. A failure in one load will not affect the other. Thats why its a best practice to run them in a sperate job so that if one fails, atleast the other wont be affected.
In your case its the opposite, you want the failure in one to affect the rest. So thats why I suggested to do it in a single job.
The design is the outcome of the requirement. Your requirement calls for it.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
shawn_ramsey
Participant
Posts: 145
Joined: Fri May 02, 2003 9:59 am
Location: Seattle, Washington. USA

Post by shawn_ramsey »

We have a load ID column in our tables that we use for audit and validation processes. We generate an ID (that is logged in a control table) and all jobs in a sequence will tag new rows with the load ID. That way we can use the id for validation counts, sums, etc as well as bulk deletions on a failed load.
Shawn Ramsey

"It is a mistake to think you can solve any major problems just with potatoes."
-- Douglas Adams
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

The load id/batch id is used in many sites for the same reason. But even that would require a specific DELETE FROM TABLE WHERE LOAD ID = <some number>
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Pavan_Yelugula
Premium Member
Premium Member
Posts: 133
Joined: Tue Nov 23, 2004 11:24 pm
Location: India

Post by Pavan_Yelugula »

Hi Guru
Thanks a lot for the Info
Gateleys
Thanks a lot and i like the idea. it is cool and clean. it is easy to implement as well. It would be really helpful if you can just clarify me on the last part.

Code: Select all

Let the FailLink loop back to the start of SubSeq1, so that it fires the jobs again (deleting the just inserted rows based on Timestamp job parameters). 
If any job fails then the failure link in the main Seq will start the subseq1 again. The Before Job subroutine will clean up the partially loaded data and the job starts again right . The failed job before subroutine will clean the data for that table and will start the job again and it will fail...Where is the loop breaking up. it is like a infinte loop right. it keeps going on and on.


Thanks
Pavan
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Pavan_Yelugula wrote:If any job fails then the failure link in the main Seq will start the subseq1 again. The Before Job subroutine will clean up the partially loaded data and the job starts again right . The failed job before subroutine will clean the data for that table and will start the job again and it will fail...Where is the loop breaking up. it is like a infinte loop right. it keeps going on and on.

Thanks
Pavan
Pavan... point noted. Let the Failure link invoke the Terminator to abort the MainSequence. Now, after the Support personnel fix the job(s) and run the MainSequence again, the before SQL will kick in to delete the concerned tables. Also, since the problem is fixed, the OKLink trigger will fire in the SubSeq1.

gateleys
Pavan_Yelugula
Premium Member
Premium Member
Posts: 133
Joined: Tue Nov 23, 2004 11:24 pm
Location: India

Post by Pavan_Yelugula »

Thanks for the Reply gateley....

The time stamp applied in the tables is the current system date and time stamp when the basic job control runs. With the said solution i guess i need to hold these failed time stamps somewhere to delete the data at a later point of time as i am not doing the load and delete(if fails) in a single sweep.

I will work around the idea you have given and will try to use it for my scenario.

Thanks a lot gateley


Thanks
Pavan
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Pavan_Yelugula wrote: I will work around the idea you have given and will try to use it for my scenario.

Thanks a lot gateley
Thanks
Pavan
You are most welcome. Now, you may want to mark your post as 'Resolved' or 'Workaround'.

gateleys
Post Reply