Need to create a job that does logical deletes (update)
Moderators: chulett, rschirm, roy
Need to create a job that does logical deletes (update)
DataStage Version = 6 with PX
O/S - AIX 5
Hi,
We have a requirement where we basically need to do a logical delete of records from a target table (say table TGT) where records don't exist in a Source table (Table SRC).
We already have a Server Job that creates a hashfile of the SRC records and uses it as a lookup in a transformer stage and does the logical delete (i.e update table TGT and set a delete_flag = 'Y')
This jobs works fine but for large tables, the hash file creation takes a lot of time.
Wondering if there is an efficient way to do this with PX ?
We are new to PX and we have been experimenting with Change Capture and Change Apply but the jobs don't seem to run.
Thanks,
Surekha
O/S - AIX 5
Hi,
We have a requirement where we basically need to do a logical delete of records from a target table (say table TGT) where records don't exist in a Source table (Table SRC).
We already have a Server Job that creates a hashfile of the SRC records and uses it as a lookup in a transformer stage and does the logical delete (i.e update table TGT and set a delete_flag = 'Y')
This jobs works fine but for large tables, the hash file creation takes a lot of time.
Wondering if there is an efficient way to do this with PX ?
We are new to PX and we have been experimenting with Change Capture and Change Apply but the jobs don't seem to run.
Thanks,
Surekha
Understanding the problem
So you basically want to perform an aggregate update:Hi,
We have a requirement where we basically need to do a logical delete of records from a target table (say table TGT) where records don't exist in a Source table (Table SRC).
We already have a Server Job that creates a hashfile of the SRC records and uses it as a lookup in a transformer stage and does the logical delete (i.e update table TGT and set a delete_flag = 'Y')
This jobs works fine but for large tables, the hash file creation takes a lot of time.
Update TGT Set delete_flag='Y'
where not exists(select 1 from SRC where....)
Question: Why do you need a hash file?
You can do this by changing the update to user-defined SQL and inserting the not exists condition (assuming that SRC is indexed by the joined columns).
You can also multi thread this job without necessarily using PX although I believe PX will improve performance as well.
Idea
Yes if the tables are on physically separate DB's then a hash file is the solution.
With PX, you lose the capability of using hash files.
Are you re-creating the entire hashfile every time you run?
It seems like your bottleneck is in the hash file rebuild
Why don't you not clear the hash file and simply update it (using UV stage).
With PX, you lose the capability of using hash files.
Are you re-creating the entire hashfile every time you run?
It seems like your bottleneck is in the hash file rebuild
Why don't you not clear the hash file and simply update it (using UV stage).
Just load a work table of the keys from the source into your target instance. That should be obscenely easy and fast. Then, using parallel DML you can execute the DELETE FROM TGT WHERE TGT.KEY NOT IN (SELECT KEY FROM YOURWORKTABLE). Avoid spanning instances with dblinks.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
1stpoint and Kenneth,
Thanks for your replies.
I will check if I am rebuilding the hashfile every time.
Also, worktables was the first thing I thought of but creating worktables/temp tables is something our Admins frown upon and is only to be done if everthing else fails.
I did create a job in PX with just a change_capture stage going to a target Oracle stage where I do the logical delete (update TGT set delete_flag = 'Y' where key_col = etc) . It seems to work as I set only the 'Drop Output for Delete' to False and everything else to True.
-Surekha
Thanks for your replies.
I will check if I am rebuilding the hashfile every time.
Also, worktables was the first thing I thought of but creating worktables/temp tables is something our Admins frown upon and is only to be done if everthing else fails.
I did create a job in PX with just a change_capture stage going to a target Oracle stage where I do the logical delete (update TGT set delete_flag = 'Y' where key_col = etc) . It seems to work as I set only the 'Drop Output for Delete' to False and everything else to True.
-Surekha
If speed is important then a parallel query based delete from a work table of keys is the fastest method for mass deleting. If you have a singlethreaded process handing keys one at a time to the database to delete, you wait while each key is found in the table and deleted.
If the table is partitioned, and you don't supply a column that helps the optimizer identify the partition (partition pruning) you will hit every partition and it will be slower. You're going to have to keep issues like this in mind when deleting data. You may have to put more information into the DELETE statement to assist the database.
If the table is partitioned, and you don't supply a column that helps the optimizer identify the partition (partition pruning) you will hit every partition and it will be slower. You're going to have to keep issues like this in mind when deleting data. You may have to put more information into the DELETE statement to assist the database.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
huh
Ken,
He is UPDATING (logical delete), not updating so partitioning would be defeatist here. I do like the worktable idea as well. You can also trim down your hash file to just the key columns.
He is UPDATING (logical delete), not updating so partitioning would be defeatist here. I do like the worktable idea as well. You can also trim down your hash file to just the key columns.
Re: Need to create a job that does logical deletes (update)
Upgrade to 7.0.1 as soon as possible (Mid-December is its current scheduled release). This would help with the performance issues on Lookup stages.Su wrote:DataStage Version = 6 with PX
O/S - AIX 5
Hi,
We have a requirement where we basically need to do a logical delete of records from a target table (say table TGT) where records don't exist in a Source table (Table SRC).
We already have a Server Job that creates a hashfile of the SRC records and uses it as a lookup in a transformer stage and does the logical delete (i.e update table TGT and set a delete_flag = 'Y')
This jobs works fine but for large tables, the hash file creation takes a lot of time.
Wondering if there is an efficient way to do this with PX ?
We are new to PX and we have been experimenting with Change Capture and Change Apply but the jobs don't seem to run.
Thanks,
Surekha
Ideal solution:
Code: Select all
Source Table
|
V
Target Table -> lookup -> dead link (copy stage)
|
V
(Reject) Delete table
Instead of the dead link and reject link, output table to itself with truncate option enabled. This is ideal if you have a LOT of deletes going on that the above solution would take a while to do.
Naturally, the above solutions are dangerous if there is a failure. You need to ensure that you have a way to recover from a mistake/failure.
Lookup Table should have hash sort, and the input table should be "partitioned" (if Oracle, dunno about other databases). Always a good thing to analyze table afterward (Oracle-wise).
Naturally, there are other ways to handle it, but this is the best 'performance-wise', especially if your database is running on a smaller box than your DataStage server.
-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
Re: huh
Okay, you got me, I wasn't paying total attention. Instead of delete, it's an update.1stpoint wrote:Ken,
He is UPDATING (logical delete), not updating so partitioning would be defeatist here.
Now, can you tell me why a parallel DML query against a partitioned table is NOT the best solution if you are able to supply the partition key and are using locally managed indexes? (Big assumption on my part this is a data warehouse and is following good DBA practices --> See Tom Kytes Expert One-On-One book)
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
That is definitely a good solution assuming the table is partitioned and they supply the partition key. If I believe I heard the original question it wasn't the update operation that was slow, it was the hash build.Now, can you tell me why a parallel DML query against a partitioned table is NOT the best solution if you are able to supply the partition key and are using locally managed indexes? (Big assumption on my part this is a data warehouse and is following good DBA practices --> See Tom Kytes Expert One-On-One book)