Reg Deletion of Row in Target

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Andal
Participant
Posts: 124
Joined: Thu Dec 02, 2004 6:24 am
Location: Bangalore, India

Reg Deletion of Row in Target

Post by Andal »

I want to clarify some basic DWH doubts. From Source , we are loading the data to target through ETL.

In my case, for some reasons i am deleting the data in the Source manually going to Back end. Is there any way through ETL to delete those rows from my Target also.

I dont want the update action to be "Truncate and Insert", because i am having millions of records in Source. So it is time consuming.

The option which comes to my mind is, Have a Hash file of the target table and lookup with source. Delete the records which is not matching with Source by writing a "Delete from "Userdefined Sql in Target.

Is there any other ways to achieve this.
Rgds
Anand
loveojha2
Participant
Posts: 362
Joined: Thu May 26, 2005 12:59 am

Post by loveojha2 »

Code: Select all

                              Source_Table
                                  |
                                  |
                                  |
                                  V
Target_Table----->SeqFile----->Trans------>Target_Table
Your update action should be Delete existing rows only. Pass only the rows which are not matching.

Hope this will help you.
Success consists of getting up just one more time than you fall.
Andal
Participant
Posts: 124
Joined: Thu Dec 02, 2004 6:24 am
Location: Bangalore, India

Post by Andal »

Love,

The problem with this approach is, The Source will be time related. Say i am maintaing records only for a year in Source and Our DWH table will contain History records.

In this case, the mentioed approach will fail.
Rgds
Anand
loveojha2
Participant
Posts: 362
Joined: Thu May 26, 2005 12:59 am

Post by loveojha2 »

Andal wrote:Love,

The problem with this approach is, The Source will be time related. Say i am maintaing records only for a year in Source and Our DWH table will contain History records.

In this case, the mentioed approach will fail.
In that case select only those rows from the target which fall in the current year (or for the year that your source table is containing). You can pass this information by means of a job parameter and use this within your selection criteria.
Success consists of getting up just one more time than you fall.
loveojha2
Participant
Posts: 362
Joined: Thu May 26, 2005 12:59 am

Post by loveojha2 »

Or you can make the source table as the primary input and target table as the lookup. This should also work well with the same update strategy.
Success consists of getting up just one more time than you fall.
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

The requirement as I understand it:

Delete all rows from target that are not as of now present in the source.

loveojha2 wrote:

Code: Select all

                              Source_Table
                                  |
                                  |
                                  |
                                  V
Target_Table----->SeqFile----->Trans------>Target_Table
Your update action should be Delete existing rows only. Pass only the rows which are not matching.

Hope this will help you.
This should work for history rows too. If you pass only the rows that dont match with the source, The history rows will not match with the source and so will go to the final target stage and get deleted.

Recomend a slight modification though:

Code: Select all

Source_Table----->Source_Hash_File
                                |
                                |
                                |
                                V
Target_Table----------------->Trans------>Target_Table
(Uncommitted read)
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

1. Physical deletes do not reclaim space, you'll still have gaps where the rows "used" to be. A column that indicates the row is "voided" is the preferable architectural solution. Rather than delete, you could update this indicator column. By excluding rows that have this set, you avoid all of the nasty problems that occur, especially when the wrong rows are deleted and you need to recover them.

2. As for deletes if you persist, gather the primary keys for deletion. Consider using those primary keys to first spool the rows targeted for deletion to a file incase of errant processing, you'll be able to know/recover the deleted rows. Use the suggested DELETE EXISTING ROWS SQL in a simple job.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply