optimum way to delete/insert

news78 · Post by **news78** » Fri Feb 16, 2007 11:57 am

Scenario:
Two tables A and B, both have columns col1. Both tables contain approx 2 million records. I need to delete all records from table B which match with table A based on col1. Then insert all records from A into B

Whats the fastest way to achieve this in DataStage server job? Database is Oracle.

Options:
A. Create a sequence with two jobs.
J1 to delete data, and when this succeeds use J2 to insert data.
J1 (delete existing rows option with col1 as the key in DRS stage)
J2 (insert into B (select * from A))

B. Create a hash file beforehand to get distinct values of B.col1. Then delete only those rows from B where there is match between hashfile.col1 and A.col1

How do I implement option B? Any other ideas?

DSguru2B · Post by **DSguru2B** » Fri Feb 16, 2007 12:02 pm

One target stage with before sql

Code: Select all

DELETE FROM TABLE B 
where B.KEY IN ( SELECT A.KEY from TABLE A)

update option set to "Insert rows without clearing". Source will be your TABLE A.
But that is going to take time for 2M records.
You need to go for TRUNCATE and BULK loads.

news78 · Post by **news78** » Fri Feb 16, 2007 12:06 pm

DSguru2B wrote:One target stage with before sql
Code: Select all
DELETE FROM TABLE B 
where B.KEY IN ( SELECT A.KEY from TABLE A)
update option set to "Insert rows without clearing". Source will be your TABLE A.
But that is going to take time for 2M records.
You need to go for TRUNCATE and BULK loads.

I thought of this. Won't my option B be faster than this since we already get the hash file(distinct values) before hand. Note that the col is not a primary key.

DSguru2B · Post by **DSguru2B** » Fri Feb 16, 2007 12:12 pm

If you already have a hashed file with distinct values then yes, go for that option.
Also, you need distinct values from table A, do a lookup against table B, get all the keys that exist and pass it to the delete statement.
OR
you can do a lookup on table B and get all the records from table B that do not have a hit. Save them in a flat file. Pass a truncate command to table B. Load table A and then Load your flat file.

news78 · Post by **news78** » Tue Feb 20, 2007 1:37 pm

If my design is as follows:

[Transformer] > [DRS]

And all am doing is issuing a delete query from the DRS stage(using the user defined sql tab, and database is oracle):
"delete from A where date1 < somedate"

Will setting the Array Size and Transaction size help from a faster query execution perspective?
Trasnsaction Size if set to say 1000 will mean a commit will be issued after every 1000 rows are deleted.
Array Size: I am not sure if Array Size is relevant here?

DSguru2B · Post by **DSguru2B** » Tue Feb 20, 2007 1:44 pm

That should be just fine. Make sure you create a dummy variable and set the constraint to @INROWNUM = 1, so that it does'nt run forever.
The array size wouldnt matter in this case.