delete large number of records in DB2

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
You face a problem oriented in DBA solutions.
DS can only be the tool to trigger your action, still it uses the DB facilities.

Your best option, if you were at design time, is to build that table partitioned by weeks so you could simply drop 1 partition and build a new one making your operation quick.

now you only need to see what do your DBA sugest, according to the available DB options.

How long does a delete statement take for 1 week's info? (Is it even an option performance wise?)

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Gaurav.Dave
Premium Member
Premium Member
Posts: 62
Joined: Tue Sep 21, 2004 10:24 am
Location: IBM - Chicago Area

Post by Gaurav.Dave »

Thanks Roy,

Didnt try delete statement as I was aware of the transaction log problem...Instead I tried export/import....It takes 6-7 hours on our test enviornment by using db2 autoloader..

Gaurav
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

UDB is a different creature than Oracle, as a partitioned database it spreads it data around different notes in an attempt to distribute the data as evenly as possible. Partitioning by data ranges is not a common method, as data would group to specific nodes and therefore a single query could bottleneck on a specific node, rather than every query using every node for better performance. The cost is maintenance nightmares, but the performance is excellent.

The option for deleting is to attempt to do the deletes in a smaller distributed fashion. A single DELETE "FROM TABLE WHERE yourdate <= today-365" is overwhelming when querying huge tables. Your best bet is to probably use a cycling delete script with a finite date window so that you will stay within your transaction log space and also get some deletes finished.

You should consider a stored procedure that takes a date as an argument, and then deletes one day at a time in a loop working backwards for the 7 days required. Execute your delete query giving the specific day, and set a row limit to delete, such as 250K. Repeat that query until no rows are deleted, then move to the next day. Play with the 250K number to stay under the transactional log capability.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Gaurav.Dave
Premium Member
Premium Member
Posts: 62
Joined: Tue Sep 21, 2004 10:24 am
Location: IBM - Chicago Area

Post by Gaurav.Dave »

kcbland wrote:UDB is a different creature than Oracle, as a partitioned database it spreads it data around different notes in an attempt to distribute the data as evenly as possible. Partitioning by data ranges is not a common method, as data would group to specific nodes and therefore a single query could bottleneck on a specific node, rather than every query using every node for better performance. The cost is maintenance nightmares, but the performance is excellent.

The option for deleting is to attempt to do the deletes in a smaller distributed fashion. A single DELETE "FROM TABLE WHERE yourdate <= today-365" is overwhelming when querying huge tables. Your best bet is to probably use a cycling delete script with a finite date window so that you will stay within your transaction log space and also get some deletes finished.

You should consider a stored procedure that takes a date as an argument, and then deletes one day at a time in a loop working backwards for the 7 days required. Execute your delete query giving the specific day, and set a row limit to delete, such as 250K. Repeat that query until no rows are deleted, then move to the next day. Play with the 250K number to stay under the transactional log capability.
Thanks Kenneth,

Thing is the update_date value inside the table is breakup on weekly bases. So, I cant seperate data based on daily...

To delete certain # of rows at a time using cycling delete, Do I have to use java program to break up the delete statements using some parameter or simple sql work in this case?

Doing analysis to figure out how to break up the deletes into manageable chunks.....


Thanks
Gaurav
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You can use a stored procedure. Here's how I'd do it in Oracle, you can figure out the method for UDB:

Code: Select all

declare
  max_rows integer := 1000;
begin
  loop
    delete from mytable
    where mydate between begindate and enddate
    and rownum <= max_rows;  
    exit when sql%rowcount = 0;
    commit;
  end loop;
end;
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Gaurav.Dave
Premium Member
Premium Member
Posts: 62
Joined: Tue Sep 21, 2004 10:24 am
Location: IBM - Chicago Area

Post by Gaurav.Dave »

kcbland wrote:You can use a stored procedure. Here's how I'd do it in Oracle, you can figure out the method for UDB:

Code: Select all

declare
  max_rows integer := 1000;
begin
  loop
    delete from mytable
    where mydate between begindate and enddate
    and rownum <= max_rows;  
    exit when sql%rowcount = 0;
    commit;
  end loop;
end;
Great thanks Ken,

Let me look on it....

Gaurav
peterbaun
Premium Member
Premium Member
Posts: 93
Joined: Thu Jul 10, 2003 5:27 am
Location: Denmark
Contact:

Post by peterbaun »

Hi Gaurav,

On DB2/AS400 you have the option of adding "with none" to the delete statement - eg:
delete from huge_table where week = 1 with none

This will disable the logging for the particular sql-statement.

I don't know if you have this feature on a "regular" DB2 database

Regards
Peter
RobinM
Charter Member
Charter Member
Posts: 25
Joined: Wed Feb 15, 2006 5:18 am
Location: London

Post by RobinM »

Old thread I know, but...

Could you use REORG DISCARD in your environment?
Image
Robin Moffatt, London, UK
Time flies like the wind, fruit flies like like a pear
Gaurav.Dave
Premium Member
Premium Member
Posts: 62
Joined: Tue Sep 21, 2004 10:24 am
Location: IBM - Chicago Area

Post by Gaurav.Dave »

RobinM wrote:Old thread I know, but...

Could you use REORG DISCARD in your environment?
I am not familiar with this. Can you give me some more information about it?

Thanks,
Gaurav
RobinM
Charter Member
Charter Member
Posts: 25
Joined: Wed Feb 15, 2006 5:18 am
Location: London

Post by RobinM »

Gaurav.Dave wrote:
RobinM wrote:Old thread I know, but...

Could you use REORG DISCARD in your environment?
I am not familiar with this. Can you give me some more information about it?

Thanks,
Gaurav
As an option when you run the REORG utility in DB2, you can specify a DISCARD FROM TABLE <xxx> WHEN <predicate>. As part of the REORG utility it excludes any data matching your condition from the data that it loaded back into the table. This can be a more efficient way of deleting large volumes of data than using a simple SQL DELETE statement.
Image
Robin Moffatt, London, UK
Time flies like the wind, fruit flies like like a pear
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Hi Gaurav

Use Kenneth's stored procedure which is the most efficient way to delete records in your case. And do the REORG once all the records are deleted. The REORG will increase the performace whenever there is big change to the table. Involve DB2 DBA to do this when the system is idle.
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
RobinM
Charter Member
Charter Member
Posts: 25
Joined: Wed Feb 15, 2006 5:18 am
Location: London

Post by RobinM »

rasi wrote:Hi Gaurav

Use Kenneth's stored procedure which is the most efficient way to delete records in your case. And do the REORG once all the records are deleted. The REORG will increase the performace whenever there is big change to the table. Involve DB2 DBA to do this when the system is idle.
I'm not familiar with the stored procedure, but if you're going to be REORGing a tablespace anyway would it not be more efficient to get it to delete your records at the same time (i.e. by including the DISCARD parameter)?
Image
Robin Moffatt, London, UK
Time flies like the wind, fruit flies like like a pear
fridge
Premium Member
Premium Member
Posts: 136
Joined: Sat Jan 10, 2004 8:51 am

possibly a simpler method

Post by fridge »

another way of approaching this is to break the delete statment into chunks -

delete from MYTABLE where KEYFIELD in (
select KEYFIELD from MYTABLE fetch first 1000 rows only
)
;

And keep running this in a loop until all the records are gone !

Obviously you want to bump up the 1000 fetch limit to something a bit more reasonable
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I unmarked the Premium Content so that everyone can see the original post replies.

The issue is what can be done within ETL that doesn't involve DBAs. Often we have to do things within ETL that are part of an automated load cycle. If you design a solution that periodically breaks because you've exceeded your tolerances on your database setup, you need alternatives.

Simply re'orging as part of an ETL cycle is usually a bad idea. Those are maintenance efforts done periodically to improve performance and optimize storage.

If you have a program that blows up when issuing commands that will exceed rollback, you need to mitigate that with programming solutions to get back within acceptable tolerances. A procedural DELETE is one such solution.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
RobinM
Charter Member
Charter Member
Posts: 25
Joined: Wed Feb 15, 2006 5:18 am
Location: London

Post by RobinM »

kcbland wrote:Simply re'orging as part of an ETL cycle is usually a bad idea. Those are maintenance efforts done periodically to improve performance and optimize storage.
I personally disagree; we use REORGs on a regular basis to delete large volumes of data. I suppose it's horses for courses really, but if you're going to be deleting lots of data then your table will need a REORG anyway, in which case why not dispense with the delete entirely and kill two birds with one REORG DISCARD :-)

Just my tuppence worth...
Image
Robin Moffatt, London, UK
Time flies like the wind, fruit flies like like a pear
Post Reply