Moderation is Broken

By Shamus Posted Sunday Apr 26, 2015

Filed under: Rants 75 comments

I’m getting a lot of complaints about this, so let me just acknowledge it publicly: The anti-spam plug-ins on this site are hosed. At least, one of them is. About a week ago, the system went crazy and began marking 1/3 of all valid comments as possible spam, meaning I have to manually approve them before they appear. This is time-consuming for me and really annoying for people trying to have a conversation and I’m not sure what to do about it.

The frustrating thing is that it’s not catching more spam, it’s just flagging more valid comments. It still lets a bunch of flagrantly obvious spam through. Like this one:

Please let me know if you're looking for a author for your weblog.
You have some really good articles and I believe I would be a good asset.
If you ever want to take some of the load off, I'd absolutely
love to write some content for your blog in exchange for a link back to mine.

Please shoot me an email if interested. Regards!

  1. I’m pretty sure this text has appeared before, and been flagged as spam before.
  2. The URL is packed with spam words.
  3. The comment has tons of stupid line breaks in the middle of sentences, which humans NEVER doOn my blog and spammers do very often.
  4. The name isn’t in English.
  5. This is a first-time comment from this “visitor”.
  6. It’s commenting on a post from April 3rd.

But WordPress thought this comment was just fine and let it through. Then it turned around and flagged dozen and dozens of comments from:

  1. People with no URL in their name or in the text of their comment. (Which means they can’t be spammers, since there’s no payload.)
  2. People with properly formatted comments.
  3. Comments where everything is clearly in English and contains no common spam words like “pills” or “SEO” or other bullshit.
  4. Comments from people who have thousands(!!!) of approved (by me) comments on this site.
  5. Comments on stuff from two days ago, which spammers can never manage.

This is some 1998 level spam filtering. I can’t believe it’s this bad. I can’t believe it worked a month ago and somehow all went sideways overnight.

I’m using a lot of spam filters. With NO filters, I’d get a dozen spam a minute and the blog would be crushed in an avalanche of nonsense. I’ve got a lot of spam filters in place to hold off the worst of it, and its clear that one of my filters has gone Airport Security on us and begun freaking out over trivialities. There aren’t any quick, obvious answers, so I’m going to need a big block of time to sort it out properly.

I’ll fix it when I get the time, but I’m currently re-installing Windows* and restoring files from backup so I can get back to work after I replaced a sketchy hard drive a few days ago.

* The actual Windows installation is quick. It’s the “everything else” that takes bloody days. Programming, source control, image editing, local WAMP server, disabling the annoying shit in Windows, audio editing, Steam, text editing, FTP, MP3 player, and a dozen other little programs and adjustments and tweaks. It actually takes a few days to get my computer back up to speed after a Windows re-install.

So I ask for patience. I know having so many comments put into moderation is maximally annoying. I’ll get to this as soon as I can.

 

Footnotes:

[1] On my blog



From The Archives:
 

75 thoughts on “Moderation is Broken

  1. Micamo says:

    Moderation system is broken, post ponies!

    Serious comment: How hard would it be to write a script that auto-flags certain comments as OK even if they trip the spam filter? e.g., known-good username/email combos, comments with no URLs in the message text, etc.

    1. Daemian Lucifer says:

      You mean like this?

    2. Endymion says:

      You just made me check the database to see if there are any pony images tagged as being anyone from here, or reginald cuftburt.

      Sadly there arn’t…. yet.

      So instead I’ll give you a hitman blood money pony.

  2. Cybron says:

    I wonder how feasible it’d be to whitelist users.

    1. Domochevsky says:

      WordPress already provides that functionality by default, via the “first post must be approved by a mod” function. After that WordPress remembers your name/email combo and lets you through, which works quite excellently against spam (never shows up since it has to be approved first), but does put more work on Shamus and whoever else does blog moderation.

      Incidentally, Shamus, do you have Antispam Bee in your pile of antispam plugins?
      It’s got my personal seal of approval.

  3. Viktor says:

    I’m a bit surprised you haven’t written your own spamfilter at this point. I mean, don’t do it now since you’re busy, but by now you know how spammers operate. It seems like you could knock out 90% of the spam with a few simple rules, and avoid false positives with a few more.

    1. MadTinkerer says:

      But that would involve Web Development. Shamus is an Actual Programmer, so his area of expertise is 3D maths and stuff. Web Development is completely different: you have to learn a ton of relatively-simple-to-understand-but-ridiculous-by-sheer-volume amount of stuff, half of which will be obsolete in a month.

      The logic of what he would have to do is simple. But he would need to know how everything else that his spam filter is interacting with works. And he’d have to maintain it according to standards and update it according to updated standards, not because of anyone demanding he do things that way, but just to maintain compatibility with everything else being updated all the time.

      This is why a lot of Actual Programmers avoid Web Development entirely. It’s not that they’re too proud: it’s that 3D graphics engines are actually less work to maintain and understand than certain kinds of web-based applications. You really have to focus on one or the other.

      EDIT: Here’s an example I just thought of:

      Shamus’ usual work involves (metaphorically) writing doctoral science papers to PHDs whose first language is English. Not an easy task, but if you have the education, it’s a straightforward thing to do.

      What you are asking him to do is (metaphorically) to come up with an elementary school science experiment that kids can do. But none of the kids speak English. In fact, he will need to write out the experiment in at least French, German, and Spanish. And there might be a transfer student from Russia in the class next month. And the only German speaking kid might be gone by then.

      Is it worth learning four languages, half of which may or may not actually apply to the specific task, to write out a simple science experiment, or is it better to just write post-doctorate science papers for the one-language-speaking PHDs?

      EDIT 2: Shamus should feel free to correct me on this, but this is my personal take on the situation.

      1. Shamus says:

        There’s the old Dilbert joke where the Boss says, “I’ll assume anything I don’t know how to do is easy.” I sort of have the opposite affliction. Anything I don’t know how to do, I’ll assume requires a massive crash course, hours of forum-diving, and reading of several obtuse white papers, all just so I can make my initial, bad, wrong, buggy first draft.

        So yeah. I imagine writing a spam filter would take me a while.

      2. Alex says:

        I imagine it doesn’t help that in Web Development, your most important “customers” are the ones that want your software to fail and not the ones that want your software to work. It’s probably a good thing that no web developer has become President, because their first official act would probably be to send a cruise missile after the last spammer they had to deal with.

      3. Zoulu says:

        I’n not sure if writing a (very simple, but very specfic) spam filter is that close to Web development. I’m a Web and Software developer, and I’m assuming a spam filter would be back-end stuff, where there is no relation to the UI. No HTML, no CSS, no JavaScript. But I never wrote a spam filter…

      4. Deoxy says:

        But that would involve Web Development. Shamus is an Actual Programmer, so his area of expertise is 3D maths and stuff. Web Development is completely different: you have to learn a ton of relatively-simple-to-understand-but-ridiculous-by-sheer-volume amount of stuff, half of which will be obsolete in a month.

        The logic of what he would have to do is simple. But he would need to know how everything else that his spam filter is interacting with works. And he'd have to maintain it according to standards and update it according to updated standards, not because of anyone demanding he do things that way, but just to maintain compatibility with everything else being updated all the time.

        This is why a lot of Actual Programmers avoid Web Development entirely. It's not that they're too proud: it's that 3D graphics engines are actually less work to maintain and understand than certain kinds of web-based applications. You really have to focus on one or the other.

        That is BEAUTIFUL, and I’m stealing it!

        I’m an “actual programmer”, and we’re transitioning our primary in-house software to web-based, and that makes SO MUCH SENSE of the silliness involved. Thank you so very much.

  4. Bropocalypse says:

    I don’t know anything about the practices of spam-filter coding, but it’s interesting to think about. If I were to make one myself, I’d probably base it on a “probability of a post being spam” metric compared with a user-controlled threshold.

    1. Decius says:

      Spam-filtering is anti-inductive: If you figure out what rules to follow to separate spam from real users, the rules will change.

      1. Bropocalypse says:

        Theoretically, a sufficiently advanced AI would be as able to discern spam from genuine comments as a human.

        1. silver Harloe says:

          The question is: is that problem harder or easier than just making an artifactual humanlike mind?

          1. MadTinkerer says:

            Exactly as difficult.

            EDIT: If you mean perfectly detect whether comments are spam or not, it’s precisely the same (impossible) problem. Imperfectly detecting whether something is likely to be spam or not is much more possible.

            1. methermeneus says:

              I think it all depends on how much the spammers change to slip under the radar. After all, sufficientlyhuman-like spam isn’t really spam anymore.

          2. Decius says:

            Harder. An artificial humanlike mind is insufficient to do quality spam detection.

        2. Decius says:

          Knowing only the content of the message and not the purpose of the site is the problem. There are many sites where a comment that is just a greeting and a link to the user’s homepage is the expected first comment.

          1. WJS says:

            I’m not sure that’s that big a problem; if you were to task an AI on this, you’d presumably train it on what comments were appropriate for your site. Not that I think using an AI would be at all practical for Shamus – giants like Facebook and Google are the ones applying them to their data, not small-time bloggers (no offense meant, Shamus).

  5. Lanthanide says:

    Once you get your computer set up, make a ghost image of the disk, for next time.

    1. blue_painted says:

      I’ve done this before for work PCs .. but unless you keep updating the image by the time you actually need it so much is out of date that it seems quicker(1) to start from fresh.

      (1) I’m never sure it is really quicker, because of the fiddly little bits that Shamus mentions.

  6. Neil D says:

    I doubt this is a new idea for you, but one of the best pieces of advice (that I never follow) is to make an immediate hard drive backup after getting everything installed and working the way you like it. That way you have a quick baseline to go back to and start fresh from whenever you need or want to.

  7. sv_blond says:

    Can’t you “just” make a very specific bot check, like requiring people to play a javascript Mario level? If spammers are targeting you specifically it probably won’t work, but it should stop general purpose bots.

    1. Sorites says:

      Sam Hughes over at qntm has a great spam-filtering device. It’s quick and simple like Shamus’s check-box, but also kinda fun and clever.

      1. sv_blond says:

        Something like that is probably better since it’s just text.

    2. Agamo says:

      This strikes me as a rather poor idea. There’s a certain level of inconvenience people are willing to put up with before they just throw up their hands and walk away.

        1. sv_blond says:

          That’s why i wrote If spammers are targeting you specifically it probably won't work. Also, it wouldn’t even have to be a Mario level, it could just be something like “slide to unlock”.

        2. Dev Chand says:

          The computers, they are quickly catching up to us! Quick, do something before Skynet becomes a reality!

  8. Narida says:

    This might look a bit spam-like, but it isn’t (considering the filters are shot, maybe this is a good thing!):

    I find Ninite (ninite.com) very useful after installing windows for installing all the common programs (FTP, SSH, browsers etc…), maybe it’ll help setting up your system! Just select what you want and it will download and install automatically, no need to go hunting for installers and manually run them.

    1. 4th Dimension says:

      It will not install totalcommander? Minus 5000 points. On the other hand TCMD can be copyed into the system.

  9. Cookyt says:

    If the spam is that bad, then it seems to me that the “check this if you aren’t a spammer” box isn’t working. Why don’t you use recaptcha (or one of its variants). The new one even has a neat feature that presents just a check box to regular users.

    1. Andrew_C says:

      From what I remember he used to use reCaptcha, but it was really ineffective, so he changed to the present system.

      1. Cybron says:

        Current reCaptcha can’t be too ineffective; I know of at least one major site that relies on it and doesn’t seem to have any issues. It is slightly annoying to fill out, though.

  10. shiroax says:

    Would doubling the “check this if you’re not a spammer” do anything? Like “check this if you’re not a spammer and check this if you are”

  11. Jokerman says:

    Why are spammers so bad at appearing human anyway? This has always confused me.

    1. 4th Dimension says:

      Presumably it helps those that use spammer services to eliminate false positive users. They want only dumb ignorant people that they can scam to visit their sites. Anybody else is a waste of their hosting allowances.

      1. swenson says:

        I’ve heard this said about email scammers as well. They don’t want to waste their time with people who might figure out the scam–they want to focus on the low risk, high reward targets who are naive enough to never figure out the scam.

        And when it comes to links in comments anyway, they probably aren’t actually interested in anybody following the links, they just wanna boost their Google ranking by having their site linked from more popular sites.

        EDIT: lol, this comment’s awaiting moderation… sorry, Shamus!

    2. AileTheAlien says:

      My guess is that it’s because they operate out of places where the main language(s) people speak/teach is different than the target language. That combined with the need to have pseudo-random text, so the anti-spam filters don’t just have a lookup-table of known paragraphs / blocks of text.

    3. Decius says:

      Appearing human is hard work. Sending a message like “This is awesome!” is easy and will fool enough of the people enough of the time.

      1. Jokerman says:

        tHis iS AwesOme!

    4. MrGuy says:

      There’s also not a always huge return to sounding human-like.

      Many spammers’ comments are not primarily intended to be read by the readers of the blog in question. This is why so many comment spammers target (or at least are fine with) non-current articles (where the comments are unlikely to be actively read). Link spam comments are intended to manipulate search engine results by making it appear a certain site or article is popular. The comment only has to look sufficiently humanlike to get accepted.

      That said, e-mail spam is generally pretty terrible English, and IS designed to be read by humans, so yeah, there are probably other factors involved.

  12. Rick says:

    As much as I hate captcha, have you thought about trying the No captcha recaptcha?

    It automatically detects most humans then gets out of the way. But it doesn’t help with those spammers who manually visit and copy+ paste their junk.

    Edit: ninjad by Cookyt after I loaded the page.

  13. Dirk Destiny says:

    I had an idea for a bot defeater but don’t know where to suggest it (or if it’s worth suggesting). Basically, ask questions that require actual intelligence rather than just pattern recognition.

    For example, present the picture at the bottom of this page and ask questions like:

    How many light blue dice are there?

    What is the sum of the green dice?

    I sometimes find captcha difficult to decipher, and this seems like it would be less frustrating. Is there any obvious flaw with this idea? Is it just the difficulty of generating questions?

    1. Thomas says:

      I wonder if spammers turk problems like this.

      1. Bryan says:

        Spammers absolutely do that with “real” captchas, so I don’t see why they wouldn’t. (Which is also why I don’t think a captcha in any form would fix the problem here.)

        It turns out that “requires a human” isn’t actually enough to stop the spam, even if that was 100% accurate, because there are some places where humans work for *really really* cheap, and the spammers don’t need the people who do their work to live in the places where the commenters are. :-/

        Filtering on the payload seems to me like it might work though. The two important parts of spam (in aggregate, not on any individual site) are the inexpense of sending all the crap, and the nonzero return from people who eventually follow the payload. If you can cut the payload-following to zero, then at least your site is a net drain on the spammer — even if it’s a tiny drain. So if you can hit them where it hurts — in the links that they’re trying to get people to follow — then I think you have a better chance of cutting stuff down.

  14. TehShrike says:

    Which spam system are you using? I haven’t run an open comments section on any of my sites in a while, but last time I did, Akismet seemed pretty much like magic.

  15. Daemian Lucifer says:

    You should really make an image of your system once you get everything together and burn it on a dvd.It wont help you when you install a new os,but it will definitely help you if you get another disk failure,or a nasty virus,or stuff like that.

    1. 4th Dimension says:

      A DVD won’t do for the entire system once Windows installs couple of gigabytes of updates + several large program suites like Visual Studio, Office and any Adobe product.

      1. Just cloning it to a smaller hard drive or something would work too

        1. 4th DImension says:

          That it does. Have an archive external drive on which you can also place a clean image of your system.

  16. Peter H. Coffin says:

    Did anyone else make the “all things in moderation” joke yet? It looks like not…

    1. Bryan says:

      All things in moderation, including moderation.

      1. Decius says:

        Moderation ought not to be taken to excess.

        1. Daemian Lucifer says:

          Someone really should moderate these jokes.

          1. shiroax says:

            Make sure to leave in the moderately good ones and better.

  17. Dev Chand says:

    This is some truly scary stuff. Imagine if major systems were run by robots and AI and one day they started malfunctioning like this. Good thing AI and robots both haven’t been developed to that level yet.

  18. hborrgg says:

    Hello Friend! I see that you have made a informative and entertaining blog post. My friend also made a blog post and thanks to GetFamous Max Plus he went viral! With GetFamous Max Plus you could go viral too! Click here to find out more! ww.virussite.lol

    edit: Wait, now I don’t get the “comment awaiting moderation” message?

  19. Nick-B says:

    Why don’t those guys with massive bot nets for DDOS use it on good targets. Y’know, like spam websites and phone-home virus computers. Peoples opinions of Anonymous or LizardSquad would skyrocket if they took it upon themselves to fight against the scum-sucking trolls of the internet.

  20. It actually takes a few days to get my computer back up to speed after a Windows re-install

    Wouldn’t it be nice if it was possible for a program to be fully contained in a single folder, where all the needed .dll’s are included in neat sub directories?

    Which allowed non-elevated installation (by default).
    And witch placed settings in the system default user setting area.
    And supported folder redirection (so settings etc. and app data can be put elsewhere than C:)

    And where the installer and uninstaller is just a convenient way to unpack and copy the files to the disk, and later remove it (via a handy Uninstall entry you can reach via the control panel for example).

    Sounds like fantasy right?
    Yet I’m able to create software like that (GridStream Player being one such example).
    It does not really need to be installed. It can have the folder moved around from drive to drive and will handle that just fine.

    Why multimillion companies with houndreds of people working on a product can’t seem to do something hat simple it makes you wonder what the heck they are really doing.

    Ideally you should only need to maybe restore some shortcuts to your programs and copy your setting back over (unless your user data is on the D drive with your programs).

    The way some software just seem to need to wedge itself into the depth of OS is just insane. There is no logical reason that is has to be that way.

    1. Deoxy says:

      I’ve been wondering this for YEARS, and I’m a professional programmer.

      Sure, there are a very few things that are useful for a whole lot of other programs… but most things USE those things and don’t provide ANY of them themselves.

      Seriously, many open-source games are like this, and they do more graphics and memory intensive things than any of the silly business crap I generally use… ?

      You’re right, it really makes no sense. Heck, as best I can tell, it’s more work to do it the silly way, too.

  21. Neko says:

    I actually did my Honours thesis on email filtering using neural networks (not just spam but arbitrary categories). I’d love to write a plugin for you but it occurs to me that I’d have to do it in PHP… shudder

    (And I say this as someone who loves Perl. I know that some people may think they’re kinda similar, but trust me, they’re not.)

    1. Decius says:

      If you can provide a sufficiently detailed flowchart, code monkeys can implement it in any language.

      I don’t know of any off-the-shelf general-purpose neural networks in PHP, and I think that a proper implementation would use too much server CPU, but neither of those are actually objections, just expected problems.

    2. lethal_guitar says:

      Well, you can always write a PHP-Plugin in C/C++. Or any language, if you wrap it in a layer binding it to PHP’s C API.

      Which is probably required anyway, in order to get decent performance.

  22. Tsi says:

    Having trouble with spammers once again. I guess you’ll have to start making your own anti-spam module or this will never end !

  23. Daemian Lucifer says:

    You know whats weirdest about this thing?Since the filter started going nuts,not a single of my comments went to moderation.Despite me using different machines,despite my spammy archer joke.And the filter used to put a few of my comments(maybe every 100th or so)in moderation earlier,especially when I was posting a bunch of them in quick succession.

    So could the problem be tied to the country the comments are coming from?

    1. Sorites says:

      You seem to have a custom avatar, which suggests you’re posting through a logged-in account somewhere.

      1. 4th Dimension says:

        Nope. I too have one of those and ALL my posts get flagged. It might have something to do with my nickname.
        Also you don’t have to be logged on for custom avatar to show up. The server itself requests the avatar based on your email.

      2. Ayegill says:

        The custom avatars are managed through Gravatar. You create a wordpress account and set an avatar, and then whenever you post with that email, your avatar shows up. Since there’s no password control(anyone could be posting using Daemian’s mail), it’s pretty doubtful that this is used to filter spammers.

        1. Daemian Lucifer says:

          (anyone could be posting using Daemian's mail)

          Only in theory though,since the email Im using hasnt existed for at least 5 years now.Its only existence is here,and in my head.But if you were to read my mind,you could post as me.

  24. droid says:

    “The comment has tons of stupid line breaks in the middle of sentences, which humans NEVER do.”
    “It's commenting on a post from April 3rd.”

    Fellow mortals: I think I might be a bot. What should I do?

    1. Bubble181 says:

      Step 1: consider a less obvious user name. ;)

  25. Duoae says:

    Haha, Airport Secu-….no, NO! No, I swear I didn’t mean to make that bomb joke – you just made me really nerv-ARUGH! Stop that! ARRGH! Stop it!

    No… No, not the cavity scanner! NOOOoooooo!

  26. Soylent Dave says:

    I still haven’t finished tidying up my PC and doing all the little extra tweaks following my last windows install

    (which was actually an almost completely new computer, but I brought in two old HDDs of stuff so I can ‘sort out what to keep’.

    Hmm. Apparently that was 11 months ago. I should get on that.)

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

Leave a Reply to Andrew_C Cancel reply

Your email address will not be published.