WordPress’s Akismet pushes Database to 30x real size

Like half the planet, I use WordPress as a blogging platform. There are many good things about it. One of those used to be Akismet (if you ignore the slightly unpleasant sales strategy: it’s free, if you give WordPress some personal details) – but today I hit a very serious bug in Akismet. I’ve had to delete Akismet – but WordPress’s coders don’t clean up after themselves, so you have to do some manual clean-up too. Read on.

Spamageddon

I was getting an average of 200 spams per day that Akismet couldn’t detect (O, RLY?). On the advice of Paul Silver, I tried Anti-Spam for WordPress … and the spam rate dropped by 99%. Joy!

A couple of days later, I noticed my database backups (courtesy of DB Backup for WordPress were failing to auto-download. (bear with me, this is related…).

DB-Backup has an option to manually run a backup in your web browser and immediately save it to disk. This normally takes 30 seconds; this time, it took 4 hours. Looking at the output, it was saving 750 megabyte backup files (normally: they’re 20 megabytes or less).

What?

Irony: Akismet spams your database

A couple of years ago, WordPress added a feature to Akismet where it stores a vast amount of extra info to every comment. I can guess what that might include, but as far as I can tell, Akismet is proprietary, so no-one really knows how it works. It’s supposed to delete this data after 15 days (or even: immediately, depending on your settings). In practice, this appears never to have worked (WordPress’s forums + blogs + twitter are full of complaints about it. NB: I have the latest version of Akismet plugin).

Login to your SQL admin account (or use PHPMyAdmin … or whatever your preferred way of editing your SQL database is).

Count the damage

With your wordpress database selected (NB: you should have one per website/blog on your server; make sure you pick the right one), these commands merely “count” stuff, they have no side-effects. The worse that can happen is they slow down your server for a few seconds while they’re counting.

select count(*) from wp_commentmeta where meta_key like "%akismet_%";

+———-+
| count(*) |
+———-+
| 545201 |
+———-+

Half a million comments that Akismet has added its “proprietary” info too. OK, if that’s what it takes to defeat spammers, so be it.

But how many are actually “spam”?

SELECT comment_approved, count(*) FROM wp_comments group by comment_approved;

+——————+———-+
| comment_approved | count(*) |
+——————+———-+
| 0 | 236 |
| 1 | 2887 |
| spam | 185455 |
| trash | 22 |
+——————+———-+

Wait, what? Of 500,000 comments … 300,000 (60%) don’t actually exist. The original comments (presumably spam) were deleted / auto-deleted long ago – but Akismet is keeping hundreds of megabytes of data on them.

Of what’s left, the majority (almost 200,000) are marked as “spam”. OK, great – it’s done a good job over the years (until recently, when spammers found their way around it). That’s not unexpected.

But since Akismet’s proprietary info is a little verbose (I’m not sure they planned this too well) and their bugs prevent it from ever being deleted … we’re wasting hundreds of MB here. This makes database backups difficult – we’re not backing up “the website and content”, we’re backing up “Akismet’s junk … plus a tiny amount of website content”.

NB: within WordPress, it seems there was a more-sensible way for Akismet to store this data, but they didn’t use it. Like all other plugins, they should (surely?) have been saving their data to a separate table, preventing these problems, and making it much easier for them to fix their deletion bugs. One poor decision by Akismet team = problems for everyone?

Delete the Akismet Scourge

First, deactivate the plugin – these days, it has very little benefit over-and-above my default WordPress anti-spam setting (“First comment from any new user requires moderator approval” — blocks 95% of all spam instantly)

…but we have to manually clear out the crud. These next commands ARE destructive, so please don’t typo them – check carefully:

1: get rid of akismet proprietary info

delete from wp_commentmeta where meta_key like "%akismet_%";

Note: my server is fast, but this took a long time:

Query OK, 545201 rows affected (2 min 26.85 sec)

2: delete all the old spams

These spams now have no further purpose (and in some cases, due to badly-written plugins, or WordPress design flaws: they can slowdown WordPress itself, especially the admin interface).

DELETE FROM wp_comments where comment_approved='spam' OR comment_approved='trash';

…and check they’ve gone:

SELECT comment_approved, count(*) FROM wp_comments group by comment_approved;
+——————+———-+
| comment_approved | count(*) |
+——————+———-+
| 0 | 236 |
| 1 | 2887 |
+——————+———-+

3 thoughts on “WordPress’s Akismet pushes Database to 30x real size”

Comments are closed.