WordPress’s Akismet pushes Database to 30x real size

Like half the planet, I use WordPress as a blogging platform. There are many good things about it. One of those used to be Akismet (if you ignore the slightly unpleasant sales strategy: it’s free, if you give WordPress some personal details) – but today I hit a very serious bug in Akismet. I’ve had to delete Akismet – but WordPress’s coders don’t clean up after themselves, so you have to do some manual clean-up too. Read on.

Spamageddon

I was getting an average of 200 spams per day that Akismet couldn’t detect (O, RLY?). On the advice of Paul Silver, I tried Anti-Spam for WordPress … and the spam rate dropped by 99%. Joy!

A couple of days later, I noticed my database backups (courtesy of DB Backup for WordPress were failing to auto-download. (bear with me, this is related…).

DB-Backup has an option to manually run a backup in your web browser and immediately save it to disk. This normally takes 30 seconds; this time, it took 4 hours. Looking at the output, it was saving 750 megabyte backup files (normally: they’re 20 megabytes or less).

What?

Irony: Akismet spams your database

A couple of years ago, WordPress added a feature to Akismet where it stores a vast amount of extra info to every comment. I can guess what that might include, but as far as I can tell, Akismet is proprietary, so no-one really knows how it works. It’s supposed to delete this data after 15 days (or even: immediately, depending on your settings). In practice, this appears never to have worked (WordPress’s forums + blogs + twitter are full of complaints about it. NB: I have the latest version of Akismet plugin).

Login to your SQL admin account (or use PHPMyAdmin … or whatever your preferred way of editing your SQL database is).

Count the damage

With your wordpress database selected (NB: you should have one per website/blog on your server; make sure you pick the right one), these commands merely “count” stuff, they have no side-effects. The worse that can happen is they slow down your server for a few seconds while they’re counting.

select count(*) from wp_commentmeta where meta_key like "%akismet_%";

+———-+
| count(*) |
+———-+
| 545201 |
+———-+

Half a million comments that Akismet has added its “proprietary” info too. OK, if that’s what it takes to defeat spammers, so be it.

But how many are actually “spam”?

SELECT comment_approved, count(*) FROM wp_comments group by comment_approved;

+——————+———-+
| comment_approved | count(*) |
+——————+———-+
| 0 | 236 |
| 1 | 2887 |
| spam | 185455 |
| trash | 22 |
+——————+———-+

Wait, what? Of 500,000 comments … 300,000 (60%) don’t actually exist. The original comments (presumably spam) were deleted / auto-deleted long ago – but Akismet is keeping hundreds of megabytes of data on them.

Of what’s left, the majority (almost 200,000) are marked as “spam”. OK, great – it’s done a good job over the years (until recently, when spammers found their way around it). That’s not unexpected.

But since Akismet’s proprietary info is a little verbose (I’m not sure they planned this too well) and their bugs prevent it from ever being deleted … we’re wasting hundreds of MB here. This makes database backups difficult – we’re not backing up “the website and content”, we’re backing up “Akismet’s junk … plus a tiny amount of website content”.

NB: within WordPress, it seems there was a more-sensible way for Akismet to store this data, but they didn’t use it. Like all other plugins, they should (surely?) have been saving their data to a separate table, preventing these problems, and making it much easier for them to fix their deletion bugs. One poor decision by Akismet team = problems for everyone?

Delete the Akismet Scourge

First, deactivate the plugin – these days, it has very little benefit over-and-above my default WordPress anti-spam setting (“First comment from any new user requires moderator approval” — blocks 95% of all spam instantly)

…but we have to manually clear out the crud. These next commands ARE destructive, so please don’t typo them – check carefully:

1: get rid of akismet proprietary info

delete from wp_commentmeta where meta_key like "%akismet_%";

Note: my server is fast, but this took a long time:

Query OK, 545201 rows affected (2 min 26.85 sec)

2: delete all the old spams

These spams now have no further purpose (and in some cases, due to badly-written plugins, or WordPress design flaws: they can slowdown WordPress itself, especially the admin interface).

DELETE FROM wp_comments where comment_approved='spam' OR comment_approved='trash';

…and check they’ve gone:

SELECT comment_approved, count(*) FROM wp_comments group by comment_approved;
+——————+———-+
| comment_approved | count(*) |
+——————+———-+
| 0 | 236 |
| 1 | 2887 |
+——————+———-+

3 thoughts on “WordPress’s Akismet pushes Database to 30x real size

  1. afslankpillen

    Ola! Adam,
    Thanks for that, Hi! Posts wont show up after clicking Categories both on the top as well as the side panel. I’m using revolution lifestyle theme 1.0. I also have this plugins: akismet, bsuite, category posts widget (activated but not using the widget), the excerpt reloaded and scriblio. I was experimenting on some plugins but had them removed/deleted. My categories are working fine before but now it wont show any posts. I get this error: NOT FOUND, ERROR 404. I tried to change my theme to the default wordpress theme. Categories worked but when I switched back to revo lifestyle, categories wont show posts again. What should I do. I’m not so techie.
    Keep up the good work

  2. adam Post author

    Debugging wordpress often requires reading/writing PHP code. If that’s beyond you, I’d recommend hiring a WordPress developer to maintain your site.

    Otherwise, you have to do the standar procedure of disabling everything, going back to the standard theme, and re-enabling themes and plugins one by one until you find the broken one.

  3. Simone

    Thanks so much for this!!! Reduced the sites database size from 2.44 GB to 117.3 MB. That comment_meta table kills!

Leave a Reply

Your email address will not be published. Required fields are marked *