Need to create a job that does logical deletes (update)

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
Su
Participant
Posts: 13
Joined: Thu Sep 26, 2002 7:16 am

Need to create a job that does logical deletes (update)

Post by Su »

DataStage Version = 6 with PX
O/S - AIX 5

Hi,
We have a requirement where we basically need to do a logical delete of records from a target table (say table TGT) where records don't exist in a Source table (Table SRC).

We already have a Server Job that creates a hashfile of the SRC records and uses it as a lookup in a transformer stage and does the logical delete (i.e update table TGT and set a delete_flag = 'Y')

This jobs works fine but for large tables, the hash file creation takes a lot of time.

Wondering if there is an efficient way to do this with PX ?
We are new to PX and we have been experimenting with Change Capture and Change Apply but the jobs don't seem to run.

Thanks,
Surekha
1stpoint
Participant
Posts: 165
Joined: Thu Nov 13, 2003 2:10 pm
Contact:

Understanding the problem

Post by 1stpoint »

Hi,
We have a requirement where we basically need to do a logical delete of records from a target table (say table TGT) where records don't exist in a Source table (Table SRC).

We already have a Server Job that creates a hashfile of the SRC records and uses it as a lookup in a transformer stage and does the logical delete (i.e update table TGT and set a delete_flag = 'Y')

This jobs works fine but for large tables, the hash file creation takes a lot of time.
So you basically want to perform an aggregate update:

Update TGT Set delete_flag='Y'
where not exists(select 1 from SRC where....)

Question: Why do you need a hash file?

You can do this by changing the update to user-defined SQL and inserting the not exists condition (assuming that SRC is indexed by the joined columns).

You can also multi thread this job without necessarily using PX although I believe PX will improve performance as well.
Su
Participant
Posts: 13
Joined: Thu Sep 26, 2002 7:16 am

Post by Su »

Thanks but the 'not exists' won't be possible 'coz the tables are not on the same Database

I forgot to mention in my initial post that theTGT and SRC tables are on different Databases/Servers .
Su
Participant
Posts: 13
Joined: Thu Sep 26, 2002 7:16 am

Post by Su »

Also wanted to mention that we do not want to use Dblinks.

I am not sure if PX lets you use dblinks..
1stpoint
Participant
Posts: 165
Joined: Thu Nov 13, 2003 2:10 pm
Contact:

Idea

Post by 1stpoint »

Yes if the tables are on physically separate DB's then a hash file is the solution.

With PX, you lose the capability of using hash files.

Are you re-creating the entire hashfile every time you run?

It seems like your bottleneck is in the hash file rebuild

Why don't you not clear the hash file and simply update it (using UV stage).
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Just load a work table of the keys from the source into your target instance. That should be obscenely easy and fast. Then, using parallel DML you can execute the DELETE FROM TGT WHERE TGT.KEY NOT IN (SELECT KEY FROM YOURWORKTABLE). Avoid spanning instances with dblinks.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Su
Participant
Posts: 13
Joined: Thu Sep 26, 2002 7:16 am

Post by Su »

1stpoint and Kenneth,

Thanks for your replies.

I will check if I am rebuilding the hashfile every time.

Also, worktables was the first thing I thought of but creating worktables/temp tables is something our Admins frown upon and is only to be done if everthing else fails.

I did create a job in PX with just a change_capture stage going to a target Oracle stage where I do the logical delete (update TGT set delete_flag = 'Y' where key_col = etc) . It seems to work as I set only the 'Drop Output for Delete' to False and everything else to True.

-Surekha
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If speed is important then a parallel query based delete from a work table of keys is the fastest method for mass deleting. If you have a singlethreaded process handing keys one at a time to the database to delete, you wait while each key is found in the table and deleted.

If the table is partitioned, and you don't supply a column that helps the optimizer identify the partition (partition pruning) you will hit every partition and it will be slower. You're going to have to keep issues like this in mind when deleting data. You may have to put more information into the DELETE statement to assist the database.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
1stpoint
Participant
Posts: 165
Joined: Thu Nov 13, 2003 2:10 pm
Contact:

huh

Post by 1stpoint »

Ken,
He is UPDATING (logical delete), not updating so partitioning would be defeatist here. I do like the worktable idea as well. You can also trim down your hash file to just the key columns.
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Re: Need to create a job that does logical deletes (update)

Post by Teej »

Su wrote:DataStage Version = 6 with PX
O/S - AIX 5

Hi,
We have a requirement where we basically need to do a logical delete of records from a target table (say table TGT) where records don't exist in a Source table (Table SRC).

We already have a Server Job that creates a hashfile of the SRC records and uses it as a lookup in a transformer stage and does the logical delete (i.e update table TGT and set a delete_flag = 'Y')

This jobs works fine but for large tables, the hash file creation takes a lot of time.

Wondering if there is an efficient way to do this with PX ?
We are new to PX and we have been experimenting with Change Capture and Change Apply but the jobs don't seem to run.

Thanks,
Surekha
Upgrade to 7.0.1 as soon as possible (Mid-December is its current scheduled release). This would help with the performance issues on Lookup stages.

Ideal solution:

Code: Select all

               Source Table
                     |
                     V
Target Table -> lookup -> dead link (copy stage)
                     |
                     V
               (Reject) Delete table
Or you can take it a step further and...

Instead of the dead link and reject link, output table to itself with truncate option enabled. This is ideal if you have a LOT of deletes going on that the above solution would take a while to do.

Naturally, the above solutions are dangerous if there is a failure. You need to ensure that you have a way to recover from a mistake/failure.

Lookup Table should have hash sort, and the input table should be "partitioned" (if Oracle, dunno about other databases). Always a good thing to analyze table afterward (Oracle-wise).

Naturally, there are other ways to handle it, but this is the best 'performance-wise', especially if your database is running on a smaller box than your DataStage server.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Re: huh

Post by kcbland »

1stpoint wrote:Ken,
He is UPDATING (logical delete), not updating so partitioning would be defeatist here.
Okay, you got me, I wasn't paying total attention. Instead of delete, it's an update.

Now, can you tell me why a parallel DML query against a partitioned table is NOT the best solution if you are able to supply the partition key and are using locally managed indexes? (Big assumption on my part this is a data warehouse and is following good DBA practices --> See Tom Kytes Expert One-On-One book)
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
1stpoint
Participant
Posts: 165
Joined: Thu Nov 13, 2003 2:10 pm
Contact:

Post by 1stpoint »

Now, can you tell me why a parallel DML query against a partitioned table is NOT the best solution if you are able to supply the partition key and are using locally managed indexes? (Big assumption on my part this is a data warehouse and is following good DBA practices --> See Tom Kytes Expert One-On-One book)
That is definitely a good solution assuming the table is partitioned and they supply the partition key. If I believe I heard the original question it wasn't the update operation that was slow, it was the hash build.
Su
Participant
Posts: 13
Joined: Thu Sep 26, 2002 7:16 am

Post by Su »

thanks to all for the wealth of info.

we will be doing physical deletes also and all this will be very useful.
Post Reply