delete large number of records in DB2
Moderators: chulett, rschirm, roy
Hi,
You face a problem oriented in DBA solutions.
DS can only be the tool to trigger your action, still it uses the DB facilities.
Your best option, if you were at design time, is to build that table partitioned by weeks so you could simply drop 1 partition and build a new one making your operation quick.
now you only need to see what do your DBA sugest, according to the available DB options.
How long does a delete statement take for 1 week's info? (Is it even an option performance wise?)
IHTH,
You face a problem oriented in DBA solutions.
DS can only be the tool to trigger your action, still it uses the DB facilities.
Your best option, if you were at design time, is to build that table partitioned by weeks so you could simply drop 1 partition and build a new one making your operation quick.
now you only need to see what do your DBA sugest, according to the available DB options.
How long does a delete statement take for 1 week's info? (Is it even an option performance wise?)
IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Time is money but when you don't have money time is all you can afford.
Search before posting:)
Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
-
- Premium Member
- Posts: 62
- Joined: Tue Sep 21, 2004 10:24 am
- Location: IBM - Chicago Area
UDB is a different creature than Oracle, as a partitioned database it spreads it data around different notes in an attempt to distribute the data as evenly as possible. Partitioning by data ranges is not a common method, as data would group to specific nodes and therefore a single query could bottleneck on a specific node, rather than every query using every node for better performance. The cost is maintenance nightmares, but the performance is excellent.
The option for deleting is to attempt to do the deletes in a smaller distributed fashion. A single DELETE "FROM TABLE WHERE yourdate <= today-365" is overwhelming when querying huge tables. Your best bet is to probably use a cycling delete script with a finite date window so that you will stay within your transaction log space and also get some deletes finished.
You should consider a stored procedure that takes a date as an argument, and then deletes one day at a time in a loop working backwards for the 7 days required. Execute your delete query giving the specific day, and set a row limit to delete, such as 250K. Repeat that query until no rows are deleted, then move to the next day. Play with the 250K number to stay under the transactional log capability.
The option for deleting is to attempt to do the deletes in a smaller distributed fashion. A single DELETE "FROM TABLE WHERE yourdate <= today-365" is overwhelming when querying huge tables. Your best bet is to probably use a cycling delete script with a finite date window so that you will stay within your transaction log space and also get some deletes finished.
You should consider a stored procedure that takes a date as an argument, and then deletes one day at a time in a loop working backwards for the 7 days required. Execute your delete query giving the specific day, and set a row limit to delete, such as 250K. Repeat that query until no rows are deleted, then move to the next day. Play with the 250K number to stay under the transactional log capability.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Premium Member
- Posts: 62
- Joined: Tue Sep 21, 2004 10:24 am
- Location: IBM - Chicago Area
Thanks Kenneth,kcbland wrote:UDB is a different creature than Oracle, as a partitioned database it spreads it data around different notes in an attempt to distribute the data as evenly as possible. Partitioning by data ranges is not a common method, as data would group to specific nodes and therefore a single query could bottleneck on a specific node, rather than every query using every node for better performance. The cost is maintenance nightmares, but the performance is excellent.
The option for deleting is to attempt to do the deletes in a smaller distributed fashion. A single DELETE "FROM TABLE WHERE yourdate <= today-365" is overwhelming when querying huge tables. Your best bet is to probably use a cycling delete script with a finite date window so that you will stay within your transaction log space and also get some deletes finished.
You should consider a stored procedure that takes a date as an argument, and then deletes one day at a time in a loop working backwards for the 7 days required. Execute your delete query giving the specific day, and set a row limit to delete, such as 250K. Repeat that query until no rows are deleted, then move to the next day. Play with the 250K number to stay under the transactional log capability.
Thing is the update_date value inside the table is breakup on weekly bases. So, I cant seperate data based on daily...
To delete certain # of rows at a time using cycling delete, Do I have to use java program to break up the delete statements using some parameter or simple sql work in this case?
Doing analysis to figure out how to break up the deletes into manageable chunks.....
Thanks
Gaurav
You can use a stored procedure. Here's how I'd do it in Oracle, you can figure out the method for UDB:
Code: Select all
declare
max_rows integer := 1000;
begin
loop
delete from mytable
where mydate between begindate and enddate
and rownum <= max_rows;
exit when sql%rowcount = 0;
commit;
end loop;
end;
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
-
- Premium Member
- Posts: 62
- Joined: Tue Sep 21, 2004 10:24 am
- Location: IBM - Chicago Area
Great thanks Ken,kcbland wrote:You can use a stored procedure. Here's how I'd do it in Oracle, you can figure out the method for UDB:
Code: Select all
declare max_rows integer := 1000; begin loop delete from mytable where mydate between begindate and enddate and rownum <= max_rows; exit when sql%rowcount = 0; commit; end loop; end;
Let me look on it....
Gaurav
-
- Premium Member
- Posts: 62
- Joined: Tue Sep 21, 2004 10:24 am
- Location: IBM - Chicago Area
As an option when you run the REORG utility in DB2, you can specify a DISCARD FROM TABLE <xxx> WHEN <predicate>. As part of the REORG utility it excludes any data matching your condition from the data that it loaded back into the table. This can be a more efficient way of deleting large volumes of data than using a simple SQL DELETE statement.Gaurav.Dave wrote:I am not familiar with this. Can you give me some more information about it?RobinM wrote:Old thread I know, but...
Could you use REORG DISCARD in your environment?
Thanks,
Gaurav
Robin Moffatt, London, UK
Time flies like the wind, fruit flies like like a pear
Hi Gaurav
Use Kenneth's stored procedure which is the most efficient way to delete records in your case. And do the REORG once all the records are deleted. The REORG will increase the performace whenever there is big change to the table. Involve DB2 DBA to do this when the system is idle.
Use Kenneth's stored procedure which is the most efficient way to delete records in your case. And do the REORG once all the records are deleted. The REORG will increase the performace whenever there is big change to the table. Involve DB2 DBA to do this when the system is idle.
Regards
Siva
Listening to the Learned
"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
Siva
Listening to the Learned
"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
I'm not familiar with the stored procedure, but if you're going to be REORGing a tablespace anyway would it not be more efficient to get it to delete your records at the same time (i.e. by including the DISCARD parameter)?rasi wrote:Hi Gaurav
Use Kenneth's stored procedure which is the most efficient way to delete records in your case. And do the REORG once all the records are deleted. The REORG will increase the performace whenever there is big change to the table. Involve DB2 DBA to do this when the system is idle.
Robin Moffatt, London, UK
Time flies like the wind, fruit flies like like a pear
possibly a simpler method
another way of approaching this is to break the delete statment into chunks -
delete from MYTABLE where KEYFIELD in (
select KEYFIELD from MYTABLE fetch first 1000 rows only
)
;
And keep running this in a loop until all the records are gone !
Obviously you want to bump up the 1000 fetch limit to something a bit more reasonable
delete from MYTABLE where KEYFIELD in (
select KEYFIELD from MYTABLE fetch first 1000 rows only
)
;
And keep running this in a loop until all the records are gone !
Obviously you want to bump up the 1000 fetch limit to something a bit more reasonable
I unmarked the Premium Content so that everyone can see the original post replies.
The issue is what can be done within ETL that doesn't involve DBAs. Often we have to do things within ETL that are part of an automated load cycle. If you design a solution that periodically breaks because you've exceeded your tolerances on your database setup, you need alternatives.
Simply re'orging as part of an ETL cycle is usually a bad idea. Those are maintenance efforts done periodically to improve performance and optimize storage.
If you have a program that blows up when issuing commands that will exceed rollback, you need to mitigate that with programming solutions to get back within acceptable tolerances. A procedural DELETE is one such solution.
The issue is what can be done within ETL that doesn't involve DBAs. Often we have to do things within ETL that are part of an automated load cycle. If you design a solution that periodically breaks because you've exceeded your tolerances on your database setup, you need alternatives.
Simply re'orging as part of an ETL cycle is usually a bad idea. Those are maintenance efforts done periodically to improve performance and optimize storage.
If you have a program that blows up when issuing commands that will exceed rollback, you need to mitigate that with programming solutions to get back within acceptable tolerances. A procedural DELETE is one such solution.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
I personally disagree; we use REORGs on a regular basis to delete large volumes of data. I suppose it's horses for courses really, but if you're going to be deleting lots of data then your table will need a REORG anyway, in which case why not dispense with the delete entirely and kill two birds with one REORG DISCARDkcbland wrote:Simply re'orging as part of an ETL cycle is usually a bad idea. Those are maintenance efforts done periodically to improve performance and optimize storage.
Just my tuppence worth...
Robin Moffatt, London, UK
Time flies like the wind, fruit flies like like a pear