delete large number of records in DB2

roy · Post by **roy** » Tue Oct 11, 2005 8:48 am

Hi,
You face a problem oriented in DBA solutions.
DS can only be the tool to trigger your action, still it uses the DB facilities.

Your best option, if you were at design time, is to build that table partitioned by weeks so you could simply drop 1 partition and build a new one making your operation quick.

now you only need to see what do your DBA sugest, according to the available DB options.

How long does a delete statement take for 1 week's info? (Is it even an option performance wise?)

IHTH,

Gaurav.Dave · Post by **Gaurav.Dave** » Tue Oct 11, 2005 8:56 am

Thanks Roy,

Didnt try delete statement as I was aware of the transaction log problem...Instead I tried export/import....It takes 6-7 hours on our test enviornment by using db2 autoloader..

Gaurav

kcbland · Post by **kcbland** » Tue Oct 11, 2005 9:16 am

UDB is a different creature than Oracle, as a partitioned database it spreads it data around different notes in an attempt to distribute the data as evenly as possible. Partitioning by data ranges is not a common method, as data would group to specific nodes and therefore a single query could bottleneck on a specific node, rather than every query using every node for better performance. The cost is maintenance nightmares, but the performance is excellent.

The option for deleting is to attempt to do the deletes in a smaller distributed fashion. A single DELETE "FROM TABLE WHERE yourdate <= today-365" is overwhelming when querying huge tables. Your best bet is to probably use a cycling delete script with a finite date window so that you will stay within your transaction log space and also get some deletes finished.

You should consider a stored procedure that takes a date as an argument, and then deletes one day at a time in a loop working backwards for the 7 days required. Execute your delete query giving the specific day, and set a row limit to delete, such as 250K. Repeat that query until no rows are deleted, then move to the next day. Play with the 250K number to stay under the transactional log capability.

Gaurav.Dave · Post by **Gaurav.Dave** » Tue Oct 11, 2005 12:03 pm

kcbland wrote:UDB is a different creature than Oracle, as a partitioned database it spreads it data around different notes in an attempt to distribute the data as evenly as possible. Partitioning by data ranges is not a common method, as data would group to specific nodes and therefore a single query could bottleneck on a specific node, rather than every query using every node for better performance. The cost is maintenance nightmares, but the performance is excellent.

The option for deleting is to attempt to do the deletes in a smaller distributed fashion. A single DELETE "FROM TABLE WHERE yourdate <= today-365" is overwhelming when querying huge tables. Your best bet is to probably use a cycling delete script with a finite date window so that you will stay within your transaction log space and also get some deletes finished.

You should consider a stored procedure that takes a date as an argument, and then deletes one day at a time in a loop working backwards for the 7 days required. Execute your delete query giving the specific day, and set a row limit to delete, such as 250K. Repeat that query until no rows are deleted, then move to the next day. Play with the 250K number to stay under the transactional log capability.

Thanks Kenneth,

Thing is the update_date value inside the table is breakup on weekly bases. So, I cant seperate data based on daily...

To delete certain # of rows at a time using cycling delete, Do I have to use java program to break up the delete statements using some parameter or simple sql work in this case?

Doing analysis to figure out how to break up the deletes into manageable chunks.....

Thanks
Gaurav

kcbland · Post by **kcbland** » Tue Oct 11, 2005 12:11 pm

You can use a stored procedure. Here's how I'd do it in Oracle, you can figure out the method for UDB:

Code: Select all

declare
  max_rows integer := 1000;
begin
  loop
    delete from mytable
    where mydate between begindate and enddate
    and rownum <= max_rows;  
    exit when sql%rowcount = 0;
    commit;
  end loop;
end;

Gaurav.Dave · Post by **Gaurav.Dave** » Tue Oct 11, 2005 1:18 pm

kcbland wrote:You can use a stored procedure. Here's how I'd do it in Oracle, you can figure out the method for UDB:
Code: Select all
declare
  max_rows integer := 1000;
begin
  loop
    delete from mytable
    where mydate between begindate and enddate
    and rownum <= max_rows;  
    exit when sql%rowcount = 0;
    commit;
  end loop;
end;

Great thanks Ken,

Let me look on it....

Gaurav

peterbaun · Post by **peterbaun** » Thu Oct 13, 2005 6:49 am

Hi Gaurav,

On DB2/AS400 you have the option of adding "with none" to the delete statement - eg:
delete from huge_table where week = 1 with none

This will disable the logging for the particular sql-statement.

I don't know if you have this feature on a "regular" DB2 database

Regards
Peter

RobinM · Post by **RobinM** » Tue Mar 28, 2006 10:59 am

Old thread I know, but...

Could you use REORG DISCARD in your environment?

Gaurav.Dave · Post by **Gaurav.Dave** » Fri May 05, 2006 12:12 pm

RobinM wrote:Old thread I know, but...

Could you use REORG DISCARD in your environment?

I am not familiar with this. Can you give me some more information about it?

Thanks,
Gaurav

RobinM · Post by **RobinM** » Mon May 08, 2006 7:34 am

Gaurav.Dave wrote:
RobinM wrote:Old thread I know, but...

Could you use REORG DISCARD in your environment?
I am not familiar with this. Can you give me some more information about it?

Thanks,
Gaurav

As an option when you run the REORG utility in DB2, you can specify a DISCARD FROM TABLE <xxx> WHEN <predicate>. As part of the REORG utility it excludes any data matching your condition from the data that it loaded back into the table. This can be a more efficient way of deleting large volumes of data than using a simple SQL DELETE statement.

rasi · Post by **rasi** » Mon May 08, 2006 7:27 pm

Hi Gaurav

Use Kenneth's stored procedure which is the most efficient way to delete records in your case. And do the REORG once all the records are deleted. The REORG will increase the performace whenever there is big change to the table. Involve DB2 DBA to do this when the system is idle.

RobinM · Post by **RobinM** » Tue May 09, 2006 2:06 am

rasi wrote:Hi Gaurav

Use Kenneth's stored procedure which is the most efficient way to delete records in your case. And do the REORG once all the records are deleted. The REORG will increase the performace whenever there is big change to the table. Involve DB2 DBA to do this when the system is idle.

I'm not familiar with the stored procedure, but if you're going to be REORGing a tablespace anyway would it not be more efficient to get it to delete your records at the same time (i.e. by including the DISCARD parameter)?

fridge · Post by **fridge** » Tue May 09, 2006 9:20 am

another way of approaching this is to break the delete statment into chunks -

delete from MYTABLE where KEYFIELD in (
select KEYFIELD from MYTABLE fetch first 1000 rows only
)
;

And keep running this in a loop until all the records are gone !

Obviously you want to bump up the 1000 fetch limit to something a bit more reasonable

kcbland · Post by **kcbland** » Tue May 09, 2006 9:33 am

I unmarked the Premium Content so that everyone can see the original post replies.

The issue is what can be done within ETL that doesn't involve DBAs. Often we have to do things within ETL that are part of an automated load cycle. If you design a solution that periodically breaks because you've exceeded your tolerances on your database setup, you need alternatives.

Simply re'orging as part of an ETL cycle is usually a bad idea. Those are maintenance efforts done periodically to improve performance and optimize storage.

If you have a program that blows up when issuing commands that will exceed rollback, you need to mitigate that with programming solutions to get back within acceptable tolerances. A procedural DELETE is one such solution.

RobinM · Post by **RobinM** » Tue May 09, 2006 9:41 am

kcbland wrote:Simply re'orging as part of an ETL cycle is usually a bad idea. Those are maintenance efforts done periodically to improve performance and optimize storage.

I personally disagree; we use REORGs on a regular basis to delete large volumes of data. I suppose it's horses for courses really, but if you're going to be deleting lots of data then your table will need a REORG anyway, in which case why not dispense with the delete entirely and kill two birds with one REORG DISCARD

Just my tuppence worth...

DSXchange

delete large number of records in DB2

possibly a simpler method