Scenario:
Two tables A and B, both have columns col1. Both tables contain approx 2 million records. I need to delete all records from table B which match with table A based on col1. Then insert all records from A into B
Whats the fastest way to achieve this in DataStage server job? Database is Oracle.
Options:
A. Create a sequence with two jobs.
J1 to delete data, and when this succeeds use J2 to insert data.
J1 (delete existing rows option with col1 as the key in DRS stage)
J2 (insert into B (select * from A))
B. Create a hash file beforehand to get distinct values of B.col1. Then delete only those rows from B where there is match between hashfile.col1 and A.col1
How do I implement option B? Any other ideas?
optimum way to delete/insert
Moderators: chulett, rschirm, roy
One target stage with before sql
update option set to "Insert rows without clearing". Source will be your TABLE A.
But that is going to take time for 2M records.
You need to go for TRUNCATE and BULK loads.
Code: Select all
DELETE FROM TABLE B
where B.KEY IN ( SELECT A.KEY from TABLE A)
But that is going to take time for 2M records.
You need to go for TRUNCATE and BULK loads.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
I thought of this. Won't my option B be faster than this since we already get the hash file(distinct values) before hand. Note that the col is not a primary key.DSguru2B wrote:One target stage with before sqlupdate option set to "Insert rows without clearing". Source will be your TABLE A.Code: Select all
DELETE FROM TABLE B where B.KEY IN ( SELECT A.KEY from TABLE A)
But that is going to take time for 2M records.
You need to go for TRUNCATE and BULK loads.
If you already have a hashed file with distinct values then yes, go for that option.
Also, you need distinct values from table A, do a lookup against table B, get all the keys that exist and pass it to the delete statement.
OR
you can do a lookup on table B and get all the records from table B that do not have a hit. Save them in a flat file. Pass a truncate command to table B. Load table A and then Load your flat file.
Also, you need distinct values from table A, do a lookup against table B, get all the keys that exist and pass it to the delete statement.
OR
you can do a lookup on table B and get all the records from table B that do not have a hit. Save them in a flat file. Pass a truncate command to table B. Load table A and then Load your flat file.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
If my design is as follows:
[Transformer] > [DRS]
And all am doing is issuing a delete query from the DRS stage(using the user defined sql tab, and database is oracle):
"delete from A where date1 < somedate"
Will setting the Array Size and Transaction size help from a faster query execution perspective?
Trasnsaction Size if set to say 1000 will mean a commit will be issued after every 1000 rows are deleted.
Array Size: I am not sure if Array Size is relevant here?
[Transformer] > [DRS]
And all am doing is issuing a delete query from the DRS stage(using the user defined sql tab, and database is oracle):
"delete from A where date1 < somedate"
Will setting the Array Size and Transaction size help from a faster query execution perspective?
Trasnsaction Size if set to say 1000 will mean a commit will be issued after every 1000 rows are deleted.
Array Size: I am not sure if Array Size is relevant here?