optimum way to delete/insert

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
news78
Participant
Posts: 44
Joined: Fri Jul 07, 2006 1:37 pm

optimum way to delete/insert

Post by news78 »

Scenario:
Two tables A and B, both have columns col1. Both tables contain approx 2 million records. I need to delete all records from table B which match with table A based on col1. Then insert all records from A into B

Whats the fastest way to achieve this in DataStage server job? Database is Oracle.

Options:
A. Create a sequence with two jobs.
J1 to delete data, and when this succeeds use J2 to insert data.
J1 (delete existing rows option with col1 as the key in DRS stage)
J2 (insert into B (select * from A))

B. Create a hash file beforehand to get distinct values of B.col1. Then delete only those rows from B where there is match between hashfile.col1 and A.col1

How do I implement option B? Any other ideas?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

One target stage with before sql

Code: Select all

DELETE FROM TABLE B 
where B.KEY IN ( SELECT A.KEY from TABLE A)
update option set to "Insert rows without clearing". Source will be your TABLE A.
But that is going to take time for 2M records.
You need to go for TRUNCATE and BULK loads.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
news78
Participant
Posts: 44
Joined: Fri Jul 07, 2006 1:37 pm

Post by news78 »

DSguru2B wrote:One target stage with before sql

Code: Select all

DELETE FROM TABLE B 
where B.KEY IN ( SELECT A.KEY from TABLE A)
update option set to "Insert rows without clearing". Source will be your TABLE A.
But that is going to take time for 2M records.
You need to go for TRUNCATE and BULK loads.
I thought of this. Won't my option B be faster than this since we already get the hash file(distinct values) before hand. Note that the col is not a primary key.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

If you already have a hashed file with distinct values then yes, go for that option.
Also, you need distinct values from table A, do a lookup against table B, get all the keys that exist and pass it to the delete statement.
OR
you can do a lookup on table B and get all the records from table B that do not have a hit. Save them in a flat file. Pass a truncate command to table B. Load table A and then Load your flat file.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
news78
Participant
Posts: 44
Joined: Fri Jul 07, 2006 1:37 pm

Post by news78 »

If my design is as follows:

[Transformer] > [DRS]

And all am doing is issuing a delete query from the DRS stage(using the user defined sql tab, and database is oracle):
"delete from A where date1 < somedate"

Will setting the Array Size and Transaction size help from a faster query execution perspective?
Trasnsaction Size if set to say 1000 will mean a commit will be issued after every 1000 rows are deleted.
Array Size: I am not sure if Array Size is relevant here?
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

That should be just fine. Make sure you create a dummy variable and set the constraint to @INROWNUM = 1, so that it does'nt run forever.
The array size wouldnt matter in this case.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply