- Part 1: Why we built the Shield
- Part 2: WordPress Super Admin Protection
- Part 3: WordPress Firewall Feature
- Part 4: WordPress Login and Brute Force Hacking Protection
- Part 5: The WordPress Comment SPAM Killer
- Part 6: WordPress Automatic Updates Management
WordPress comment spam is the biggest bane of every WordPress administrator’s existence on this earth.
Finding the ultimate defense against the never-ending waves of comment spam is the holy grail of comments management.
The Shield plugin for WordPress takes a fresh, new approach to the problem. We’ve all but eliminated WordPress comment spam altogether.
That’s a tall claim, you say, so in this part of the series I’ll dig into how our WordPress Comments Protection/Filter works, and why it’s so darn effective.
You should know: there are TWO types of WordPress comment spam
Every WordPress spam comment falls under 1 of 2 categories:
- It’s a comment submitted to your site by a human – a real-life human being putting a comment on your site
- It’s a comment submitted by an automatic spam bot
Shield solves the problem caused by both of these types of comment spam. Unlike other spam fighting techniques we use 2 different detection engines based on the different nature of these types of comments.
You can’t treat both types of comment spam the same way.
How do we combat Automatic Bot Comment Spam?
Comment spam generated by bots, by computer programs, is naturally … unnatural.
What exactly do I mean by that?
Submitting comments as a bot follows a certain pattern that is different to human comment spam.
The bots are designed to meet these requirements:
- Mass commenting – ability to submit 100/1000s of comments to millions of websites quickly
- Comments submission follows the WordPress structure and doesn’t adapt well to tweaks and changes in the form
You’ll notice in this case I don’t even mention the “content” of the spam comments. If you can thwart these 2 basic principles, you win – that is, you can effectively identify and block spam comments from bots without caring about the content.
Shield first identifies a comment as being from an automated bot-spam before we even consider analyzing the content.
Techniques we use to identify Automatic Bot comment spam
We use several techniques to achieve this based on the nature of bots.
That’s an easy win!
This technique isn’t our idea, we adapted it from the Growmap Anti Spambot Plugin for WordPress. We built this into our plugin after bots worked out a way to get around their plugin.
We adapted their technique in 1 hugely important way: we dynamically generate a new name for checkbox in the form. In this simple way, the bots cannot ever know or “guess” the checkbox name they need to add to the comment form when they submit it.
[ 2 ] Bots love to fill in forms
We have also added a honey-pot to WordPress comment forms. It’s an old, but long-establish technique.
A honey-pot is where we put in a hidden field to the comment form – a normal human visitor can’t see it and so they wont enter any value for it.
Since bots look at forms and fill in fake “spammy” values for everything, we know that if we receive a WordPress comment that has a value for this hidden field, it is a spam comment.
[ 3 ] Unique Comment Tokens
The truth of the matter is that all comment spam techniques ultimately have a limited lifetime – until the bot developers work out a way around your spam defenses.
And, if your spam defenses are all visible on the browser, they can easily to work around them (eventually). This is exactly what happened the GASP plugin last year.
When we built our anti-spam bot protection, we tried to push it to the server-side, reducing complexity for the user, while also making it more difficult for bots to adapt.
With this, we introduced Unique Comment Tokens. So how do they work?
- Every single visit, to every single webpage on your site generates a brand new, unique comment token – even if you refresh your page, you’ll get a unique token.
- This unique, one-time comment token is embedded within the comment form and submitted to the WordPress site along with all the other comment information.
- When WordPress processes the comment, we examine the unique comment token – we look-up the comment token database and ask “does this unique comment token match the page ID of the comment?”.
- If this check fails, we can mark it as comment spam.
However, we have taken this unique token concept even further…
We know that spam bots will eventually try to adapt, and to do so, they may try to first of all load the page and grab the unique comment token from it. Then, with the comment token in hand, try to submit a comment to the page.
To combat this, unique comment tokens have the following properties:
- Comment Tokens may only be used once. That is, once a comment token has been submitted against a page and it’s valid, we delete the token so it can never be used again.
- Comment Tokens have a start time – that is, after it has been generated, if a comment is received using a comment token and it’s too “early“, we reject the comment (more on this below)
- Comment Tokens have an expiry time – that is, after it has been generated, if a comment is received using a comment token and it’s too “late“, we reject the comment (more on this below)
So what is this start / expiry time all about?
Remember the first requirement of a spam bot? They must be able to post 1000s of spam comments quickly.
If/When they grab a comment token, they will try to submit a comment immediately. If they do that, we’ll know it’s a bot. So that legitimate visitors don’t submit comments too soon, we disable the comment form’s button and show them a countdown timer that lets them know when they can submit their comment.
On the other side, we need to ensure comment tokens expire so they cannot be used indefinitely. In the same way, we will be able to detect a comment as spam if a bot waits too long before using a token.
Bots have no way to detect these start/expiry times, which means that even if they do work around the comment tokens system, they still have to get past the timings.
The built-in delays also restrict how quickly they can post comment spam in the first place.
How we combat Human spam Comments
By far the most difficult commen spam to combat is human spam – that is, someone loaded up your site and manually submitted a comment.
All we can do so is try and match the content of the comment against a known list of recognised spam content.
And this is what we do with our human comment spam protection feature
Using the frequently updated blacklist by Grant Hutchinson found here, we scan the content of all WordPress comments for any matches on this spam blacklist.
Depending on your preferred settings, you can scan one or more of the following fields:
- Comment Author
- Comment Email
- Comment URL
- Comment Content
- IP address
- User Agent String
For every single word in the blacklist, we scan each of the fields you selected for the presence of the blacklist words. Given that the list is 10,000+ this is a lot of processing… but, against Grant’s suggested approach, we don’t use the built-in WordPress blacklist.
We decided to work outside of WordPress’s built-in blacklist for 3 huge reasons:
- WordPress’s blacklist scanning function is horribly inefficient as it’s uses PHP’s
preg_match()function (6 times per blacklist keyword!) to look for matches
- WordPress’s blacklist scanning function scans all 6 of the fields we mentioned… we wanted to give administrators the option to choose, and thereby reduce some false-negatives
- We don’t want to interfere with your personal blacklist so you can also maintain one alongside this one.
How is this different to Akismet, and is it better?
When a visitor comments on your site, it’s up to the code within WordPress and your ‘Discussion’ settings that determine how the comments are handled.
Akismet is the anti-spam plugin that ships with all WordPress installations – Akismet is to WordPress, what ‘Internet Explorer’ is to Windows – it is the default, pre-installed, anti-trust, anti-competition solution for a core platform feature.
But, worse than Internet Explorer, it is a licensed premium service such that unless you are an individual, you must have a valid Akismet license which you pay for.
Politics and ethics aside for now, I have never had a good experience with Akismet. I got way too many false positives for my liking which meant legitimate comments would get lost in a sea of spam.
I also don’t like the fact that every comment that enters my site is passed outside of my site and sent to Auttomatic for processing. This is rather unnecessary in my opinion.
So, our human comment spam filter takes the WordPress blacklist mentioned earlier, and scans all comments internally, keeping your data on your site.
It doesn’t catch absolutely everything, but it catches most, though the majority of comment spam is caught within the spam bot filter before it even reaches the human spam filter.
Akismet doesn’t have a separate “spam bot” filter and must rely solely on content analysis.
Suggestions and Feedback
What are your experiences with WordPress Comment spam? Are you happy with your service, have you used Akismet and are you happy to pay the fee for it?
Please let us know your experiences either with this plugin, or others that you’ve tried. If you think there are ways we can improve our, drop us a comment below.