Issue
I am developping a Hibernate / Spring application with a built-in database maintenance feature. Every 15 minutes it looks at certain tables and will purge old records based on certain parameters. In the case of my LogEntry entity for example, i am purging based on 2 parameters : how old the record is and how many records are in the table. In the first case, I am doing something like this :
@Override
public int deleteExpiredEntries(int systemLogKeepTimeInDays, int systemLogMaxEntries)
{
Session session = getSession();
Query query = session.createQuery("DELETE FROM LogEntry l WHERE l.time < :p");
Calendar cal = Calendar.getInstance();
cal.setTime(new Date());
cal.add(Calendar.DAY_OF_YEAR, -systemLogKeepTimeInDays);
return query.setParameter("p", cal.getTime()).executeUpdate();
}
How can I keep say 5000 records when this maintenance runs using a similar process?
I thought of using the Id column and purging whatever has a greater Id than 5000 but this would actually purge the new records and not the old ones!
How would you approach this problem?
Solution
You can try to solve the problem with SQL query itself.
First, you're gonna need to take first top 5000 new records.
SELECT id FROM LogEntry ORDER BY time DESC LIMIT 100000 OFFSET 5000;
I would recommend you to use LIMIT and set it to suit your needs so the query won't take too long to execute. By this, you can execute query as many time as you can and you'll always grab the freshest data because of ORDER BY and because of OFFSET 5000 you just preserved newest 5000 records.
The next step would be deleting:
DELETE FROM LogEntry WHERE id IN
(SELECT id FROM ( SELECT id FROM LogEntry ORDER BY time DESC LIMIT 100000 OFFSET 5000) table_alias);
Maybe you are wondering why do I use SELECT upon subquery. That's because I need a reference to that result I selected, as you see it's named table_alias. MySql itself won't execute query if you try to execute DELETE with subquery using IN keyword. You'll get the error:
This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery
The official documentations (says):
In general, you cannot modify a table and select from the same table in a subquery.
Exception: The preceding prohibition does not apply if for the modified table you are using a derived table (subquery in the FROM clause) and that derived table is materialized rather than merged into the outer query.
MSSQL Solution
DELETE FROM LogEntry WHERE id BETWEEN
(
SELECT MIN(id) FROM LogEntry ORDER BY time DESC
OFFSET 5000 ROWS
FETCH NEXT 100000 ROWS ONLY
)
AND
(SELECT MAX(id) FROM LogEntry ORDER BY time DESC
OFFSET 5000 ROWS
FETCH NEXT 100000 ROWS ONLY)
Since you've asked for MSSQL solution in comments, I've tried to come up with something. I haven't tested this query, but I have reference which lead me to it: this similar topic and because you need to ignore first 5000 rows, visit thispage
I hope this helped at least a little bit or even give you a better idea than mine of what could you do!
Answered By - Marko
Answer Checked By - Katrina (JavaFixing Volunteer)