Part 5: Ultimate Comment SPAM Killer – Shield WordPress Security Plugin

WordPress comment spam is the biggest bane of every WordPress administrator’s existence on this earth.

Finding the ultimate defense against the never-ending waves of comment spam is the holy grail of comments management.

The Shield plugin for WordPress takes a fresh, new approach to the problem. We’ve all but eliminated WordPress comment spam altogether.

That’s a tall claim, you say, so in this part of the series I’ll dig into how our WordPress Comments Protection/Filter works, and why it’s so darn effective.

You should know: there are TWO types of WordPress comment spam

Every WordPress spam comment falls under 1 of 2 categories:

  1. It’s a comment submitted to your site by a human – a real-life human being putting a comment on your site
  2. It’s a comment submitted by an automatic spam bot

Shield solves the problem caused by both of these types of comment spam. Unlike other spam fighting techniques we use 2 different detection engines based on the different nature of these types of comments.

You can’t treat both types of comment spam the same way.

How do we combat Automatic Bot Comment Spam?

Comment spam generated by bots, by computer programs, is naturally … unnatural.

What exactly do I mean by that?

Submitting comments as a bot follows a certain pattern that is different to human comment spam.

The bots are designed to meet these requirements:

  • Mass commenting – ability to submit 100/1000s of comments to millions of websites quickly
  • Comments submission follows the WordPress structure and doesn’t adapt well to tweaks and changes in the form

You’ll notice in this case I don’t even mention the “content” of the spam comments. If you can thwart these 2 basic principles, you win – that is, you can effectively identify and block spam comments from bots without caring about the content.

Shield first identifies a comment as being from an automated bot-spam before we even consider analyzing the content.

Techniques we use to identify Automatic Bot comment spam

Shield Bot spam Comments Options

Bot spam Comments Options

We use several techniques to achieve this based on the nature of bots.

[ 1 ] Bots don’t use Javascript

Since most bots don’t process Javascript, we can identify comments from bots using a Javascript-based trick. It works like this:

  • When the WordPress comment form loads up, we add a piece of Javascript to it.
  • When the Javascript runs on the user’s browser, it creates a new checkbox and adds it to the comment form.
  • Beside the checkbox, we ask the human visitor to simply check it, as an anti-spam measure. The user doesn’t know (or care) that the checkbox was created with Javascript… it’s just a normal checkbox.
  • Then, when the comment is sent to WordPress we look for the presence of this checkbox… if it’s there, then we know a human submitted the comment. If it’s not there, we know that a bot just used the standard WordPress comment form to submit the comment. They didn’t consider the checkbox because they never “loaded” the page, and the Javascript.

That’s an easy win!

This technique isn’t our idea, we adapted it from the Growmap Anti Spambot Plugin for WordPress. We built this into our plugin after bots worked out a way to get around their plugin.

We adapted their technique in 1 hugely important way: we dynamically generate a new name for checkbox in the form. In this simple way, the bots cannot ever know or “guess” the checkbox name they need to add to the comment form when they submit it.

[ 2 ] Bots love to fill in forms

We have also added a honey-pot to WordPress comment forms. It’s an old, but long-establish technique.

A honey-pot is where we put in a hidden field to the comment form – a normal human visitor can’t see it and so they wont enter any value for it.

Since bots look at forms and fill in fake “spammy” values for everything, we know that if we receive a WordPress comment that has a value for this hidden field, it is a spam comment.

[ 3 ] Unique Comment Tokens

The truth of the matter is that all comment spam techniques ultimately have a limited lifetime – until the bot developers work out a way around your spam defenses.

And, if your spam defenses are all visible on the browser, they can easily to work around them (eventually). This is exactly what happened the GASP plugin last year.

When we built our anti-spam bot protection, we tried to push it to the server-side, reducing complexity for the user, while also making it more difficult for bots to adapt.

With this, we introduced Unique Comment Tokens. So how do they work?

  • Every single visit, to every single webpage on your site generates a brand new, unique comment token – even if you refresh your page, you’ll get a unique token.
  • This unique, one-time comment token is embedded within the comment form and submitted to the WordPress site along with all the other comment information.
  • When WordPress processes the comment, we examine the unique comment token – we look-up the comment token database and ask “does this unique comment token match the page ID of the comment?”.
  • If this check fails, we can mark it as comment spam.

However, we have taken this unique token concept even further…

We know that spam bots will eventually try to adapt, and to do so, they may try to first of all load the page and grab the unique comment token from it. Then, with the comment token in hand, try to submit a comment to the page.

To combat this, unique comment tokens have the following properties:

  • Comment Tokens may only be used once. That is, once a comment token has been submitted against a page and it’s valid, we delete the token so it can never be used again.
  • Comment Tokens have a start time – that is, after it has been generated, if a comment is received using a comment token and it’s too “early“, we reject the comment (more on this below)
  • Comment Tokens have an expiry time – that is, after it has been generated, if a comment is received using a comment token and it’s too “late“, we reject the comment (more on this below)

So what is this start / expiry time all about?

Remember the first requirement of a spam bot? They must be able to post 1000s of spam comments quickly.

If/When they grab a comment token, they will try to submit a comment immediately. If they do that, we’ll know it’s a bot. So that legitimate visitors don’t submit comments too soon, we disable the comment form’s button and show them a countdown timer that lets them know when they can submit their comment.

On the other side, we need to ensure comment tokens expire so they cannot be used indefinitely. In the same way, we will be able to detect a comment as spam if a bot waits too long before using a token.

Bots have no way to detect these start/expiry times, which means that even if they do work around the comment tokens system, they still have to get past the timings.

The built-in delays also restrict how quickly they can post comment spam in the first place.


How we combat Human spam Comments

By far the most difficult commen spam to combat is human spam – that is, someone loaded up your site and manually submitted a comment.

All we can do so is try and match the content of the comment against a known list of recognised spam content.

And this is what we do with our human comment spam protection feature

Using the frequently updated blacklist by Grant Hutchinson found here, we scan the content of all WordPress comments for any matches on this spam blacklist.

Depending on your preferred settings, you can scan one or more of the following fields:

  • Comment Author
  • Comment Email
  • Comment URL
  • Comment Content
  • IP address
  • User Agent String

For every single word in the blacklist, we scan each of the fields you selected for the presence of the blacklist words. Given that the list is 10,000+ this is a lot of processing… but, against Grant’s suggested approach, we don’t use the built-in WordPress blacklist.

We decided to work outside of WordPress’s built-in blacklist for 3 huge reasons:

  1. WordPress’s blacklist scanning function is horribly inefficient as it’s uses PHP’s preg_match() function (6 times per blacklist keyword!) to look for matches
  2. WordPress’s blacklist scanning function scans all 6 of the fields we mentioned… we wanted to give administrators the option to choose, and thereby reduce some false-negatives
  3. We don’t want to interfere with your personal blacklist so you can also maintain one alongside this one.

How is this different to Akismet, and is it better?

When a visitor comments on your site, it’s up to the code within WordPress and your ‘Discussion’ settings that determine how the comments are handled.

Akismet is the anti-spam plugin that ships with all WordPress installations – Akismet is to WordPress, what ‘Internet Explorer’ is to Windows – it is the default, pre-installed, anti-trust, anti-competition solution for a core platform feature.

But, worse than Internet Explorer, it is a licensed premium service such that unless you are an individual, you must have a valid Akismet license which you pay for.

Politics and ethics aside for now, I have never had a good experience with Akismet. I got way too many false positives for my liking which meant legitimate comments would get lost in a sea of spam.

I also don’t like the fact that every comment that enters my site is passed outside of my site and sent to Auttomatic for processing. This is rather unnecessary in my opinion.

So, our human comment spam filter takes the WordPress blacklist mentioned earlier, and scans all comments internally, keeping your data on your site.

It doesn’t catch absolutely everything, but it catches most, though the majority of comment spam is caught within the spam bot filter before it even reaches the human spam filter.

Akismet doesn’t have a separate “spam bot” filter and must rely solely on content analysis.

Suggestions and Feedback

What are your experiences with WordPress Comment spam? Are you happy with your service, have you used Akismet and are you happy to pay the fee for it?

Please let us know your experiences either with this plugin, or others that you’ve tried. If you think there are ways we can improve our, drop us a comment below.

Join the discussion 16 Comments

  • Hi,

    I have used Akismet for long and has not been very satisfied with it. It seems that it takes lots of server resources when you have lots of blogs and lots of Spam.

    Just started to use yours recently on one blog. It caches Spam very well.

    One question: When the default Spam action is set to “reject and redirect”, where does the Spam go?

    Thanks.

    View Comment
    • Paul G. says:

      Hi Michel,

      Reject and Redirect will basically never allow the comment to even reach your database. It’ll never be saved and the poster will be redirected to the page/post to which they’re posting the comment.

      Let me know if you need any further details.
      Thanks!
      Paul.

      View Comment
  • Greg says:

    Does your system integrate with Disqus or other third-party commenting systems?

    View Comment
    • Paul G. says:

      Hi Greg,

      Thanks for your question.

      The plugin doesn’t currently integrate with the likes of Disqus because these plugins/services have their own custom comments handling altogether and it wouldn’t make sense to start fiddling with their system – we’re likely to break something there 🙂

      Hope that helps.
      Paul.

      View Comment
  • Atul says:

    Hello Paul G.

    First of all thanks for making such an amazing plugin for WordPress that really make some sense. I used other anti comment spam plugins but nothing works like yours. Finally I am happy to use WordPress Security Simple Firewall plugin. Thanks You.

    View Comment
  • Mike Little says:

    How does the unique comment token work with page caching (e.g. WP Supercache or Cloudflare)?

    That is, if everyone receives the same cached version of a page, does this mean only one person can comment (and before the timeout). Everyone else served the same cached page will trigger the test?

    View Comment
  • Bier says:

    “WordPress’s blacklist scanning function is horribly inefficient as it’s uses PHP’s preg_match() function (6 times per blacklist keyword!) to look for matches”

    Why does WordPress need to call the preg_match function 6 times per keyword? How do you ensure to get the same number of hits with less effort?

    Greetings

    View Comment
    • Paul G. says:

      Why does WordPress need to do that? I’ve no idea… that’s the way the author of this particular code decided to implement this. 🙂

      My approach is to take each “spam” word/pattern and I use “stripos()” on each item of the comment that needs to be checked.

      The truth is that efficiency isn’t hugely important in this area because it’s only run when a comment is posted. I could probably optimize my approach too, but again, it’s not critical.

      Further reading: http://lzone.de/articles/php-string-search.htm

      View Comment
  • Bier says:

    Oh, and another one: Do the comment tokens work with Caching plugins, when the html source code is read from cache?

    View Comment
    • Paul G. says:

      This is something that you’ll have to test with your particular installation(s) and configuration. Aggressive page caching will probably affect this functionality, but that is the double-edged sword that is “caching”.

      I’d be interested to hear what you find with your tests.

      Thanks!

      View Comment
  • Olu Oduwole says:

    Hi there,

    Am glad I came across your plugin and am particularly impressed by numerous awesome feedback you’ve got in this regard. Well done.

    Am particularly facing spammy registrations as well as unprecedented amount of brute force attacks, spammy comments, SQL injections, etc. on my social networking site from bots and humans. Worse still, these rogue bots registered on my site, confirmation email sent to them wasn’t delivered, yet they manage to log in to my site. This is now a frequent problem for me.
    I have WAF CloudProxy by Sucuri but it isn’t helping.

    I went through your Simple Security Firewall plugin descriptions over and over again but not mention of combating fake registrations from bots, Is there anything am missing?

    I now plan to install your Simple Security Firewall plugin alongside https://en-gb.wordpress.org/plugins/registration-honeypot/ to fully combat the fake registrations.

    Please kindly advise me.

    Kind regards,
    Olu Oduwole.

    View Comment
  • Adil says:

    Hi there, thanks for this awesome plugin! I’m having an issue with the bot spam checkbox not appearing for lost password recovery when WOOCOMMERCE is activated. When attempting to recover a lost password, entering a username or email address results in this error message: You must check that box to say you’re not a bot.

    With WooCommerce is activated, the lost password URL is changed from /access?action=lostpassword to /my-account/lost-password/.

    View Comment

Leave a Reply