Reg Deletion of Row in Target

Andal · Post by **Andal** » Fri Dec 16, 2005 12:46 am

I want to clarify some basic DWH doubts. From Source , we are loading the data to target through ETL.

In my case, for some reasons i am deleting the data in the Source manually going to Back end. Is there any way through ETL to delete those rows from my Target also.

I dont want the update action to be "Truncate and Insert", because i am having millions of records in Source. So it is time consuming.

The option which comes to my mind is, Have a Hash file of the target table and lookup with source. Delete the records which is not matching with Source by writing a "Delete from "Userdefined Sql in Target.

Is there any other ways to achieve this.

loveojha2 · Post by **loveojha2** » Fri Dec 16, 2005 1:04 am

Code: Select all

                              Source_Table
                                  |
                                  |
                                  |
                                  V
Target_Table----->SeqFile----->Trans------>Target_Table

Your update action should be Delete existing rows only. Pass only the rows which are not matching.

Hope this will help you.

Andal · Post by **Andal** » Fri Dec 16, 2005 1:29 am

Love,

The problem with this approach is, The Source will be time related. Say i am maintaing records only for a year in Source and Our DWH table will contain History records.

In this case, the mentioed approach will fail.

loveojha2 · Post by **loveojha2** » Fri Dec 16, 2005 2:28 am

Andal wrote:Love,

The problem with this approach is, The Source will be time related. Say i am maintaing records only for a year in Source and Our DWH table will contain History records.

In this case, the mentioed approach will fail.

In that case select only those rows from the target which fall in the current year (or for the year that your source table is containing). You can pass this information by means of a job parameter and use this within your selection criteria.

loveojha2 · Post by **loveojha2** » Fri Dec 16, 2005 2:53 am

Or you can make the source table as the primary input and target table as the lookup. This should also work well with the same update strategy.

ameyvaidya · Post by **ameyvaidya** » Fri Dec 16, 2005 11:20 am

The requirement as I understand it:

Delete all rows from target that are not as of now present in the source.

loveojha2 wrote:
Code: Select all
                              Source_Table
                                  |
                                  |
                                  |
                                  V
Target_Table----->SeqFile----->Trans------>Target_Table
Your update action should be Delete existing rows only. Pass only the rows which are not matching.

Hope this will help you.

This should work for history rows too. If you pass only the rows that dont match with the source, The history rows will not match with the source and so will go to the final target stage and get deleted.

Recomend a slight modification though:

Code: Select all

Source_Table----->Source_Hash_File
                                |
                                |
                                |
                                V
Target_Table----------------->Trans------>Target_Table
(Uncommitted read)

kcbland · Post by **kcbland** » Fri Dec 16, 2005 11:26 am

1. Physical deletes do not reclaim space, you'll still have gaps where the rows "used" to be. A column that indicates the row is "voided" is the preferable architectural solution. Rather than delete, you could update this indicator column. By excluding rows that have this set, you avoid all of the nasty problems that occur, especially when the wrong rows are deleted and you need to recover them.

2. As for deletes if you persist, gather the primary keys for deletion. Consider using those primary keys to first spool the rows targeted for deletion to a file incase of errant processing, you'll be able to know/recover the deleted rows. Use the suggested DELETE EXISTING ROWS SQL in a simple job.