Overused Words in Game Titles

By Shamus Posted Sunday Feb 15, 2015

Filed under: Programming 114 comments

It’s been a running joke for a couple of years that half the games coming out have the word “Dead” in the titleAlso, games with ‘half’ in the title are dead.. Dead Space, Dead Island, Deadlight, Left 4 Dead, etc. So it got me thinking: Just how common is the practice, really? Is the word “Dead” really as played out as it seems, or is this a case of confirmation bias run amok? Aside from “dead”, what are the top overused words in game titles? Are there any overused words that we just don’t notice?

So I’m going to find out. Since I don’t want to run through and manually enter the name of every videogame ever made, I need a way to automate this. The path of least resistance seems to be to use Steam’s library. Being a PC platform, Steam is obviously missing a ton of games. But this should be close enough for our purposes. This isn’t science, it’s trivia.

Sadly, I can’t find a clean way to extract a full list of titles from Steam. The closest I can come is this file, which looks kind of promising at first. But there’s no way of knowing how old the list is, or if all games are listed.

Worse, the list includes a lot of non-game stuff like DLC and trailers. Which means that if there was a game called Dead Shooter, then it might appear several times in our list like so:

Dead Shooter Guns Pack 1
Dead Shooter
Dead Shooter Release Trailer
Dead Shooter Launch Trailer
Dead Shooter Beta
Dead Shooter Guns Pack 2
Dead Shooter GOTY Edition
Dead Shooter Brady Guide

Ugh. I feel strongly that we should not count “Dead Shooter” EIGHT TIMES. There’s only one game called Dead Shooter so it should only count once.

This wall of text is boring, so enjoy this image of a woman drawing random database crap on a window. Feel free to critique her schema. It actually looks pretty solid to me, except it looks like you can only order one product at a time. Also, I really hope that "password" is actually the hash and not the raw text.
This wall of text is boring, so enjoy this image of a woman drawing random database crap on a window. Feel free to critique her schema. It actually looks pretty solid to me, except it looks like you can only order one product at a time. Also, I really hope that "password" is actually the hash and not the raw text.

In an ideal world I could just query the Steam database and filter out things like trailers, betas, and DLC. But apparently the paranoid people at Valve don’t want to allow open database access to all the random strangers on the internet? (The nerve!) So we have to settle for parsing this text file and trying to untangle it ourselves.

Some DLC has the word DLC in the title. But not all of it. Sometimes it will use the word “Pack”. Sometimes there won’t be any special descriptor at all: “Dead Shooter – More Guns”. There’s no way to know if that’s DLC, a trailer, a sequel, or a special release of the game without going to the Store page and looking at it. And I’m not going to manually inspect all 15 thousand games in this list. Sorry…that you’re demanding something so unreasonable. What’s your problem, anyway? Sheesh!.

Our goal:

  1. Parse a large text file and strip out all the crap that isn’t the title of something.
  2. Remove everything than can be identified as DLC, beta, trailers, etc.
  3. Do a word frequency count on the remaining titles.
  4. A few words shouldn’t be tracked. Sequel numbers aren’t very interesting for this project. Neither are words like “the” and “of”.
  5. Find the top N most overused words and list them, along with their games.

We need to find the best most convenient tool for this particular job.

I actually own tools like this, except my set also includes a special screwdriver that is always magically the kind I don’t need at the moment.
I actually own tools like this, except my set also includes a special screwdriver that is always magically the kind I don’t need at the moment.

So we need to use a language or a script that’s good at doing complex conditional text parsing. I know some people will say Perl script is a good tool for this job, but I can’t tell the difference between valid Perl and something I typed with my faceIs there a difference?. This could also be a good job for Python, but I don’t know Python at the moment. I always find myself in one of two states:

  1. This is probably a good job for Python, but I don’t want to stop working on this project to learn a new language. I’ll learn it later when I’m not so busy.
  2. Man, I really ought to learn Python. But I don’t need it for anything right now. I’ll wait until I have a project I can use it on.

So I guess I’ll use everyone’s favorite language, PHP. And to be fair, I think PHP is a good fit here. When you don’t care about stability, security, or performance, and where code maintenance isn’t a concern, then PHP can often be a decent choice.

So I write some PHP to chew through the text and give us the goods.

The result? A complete mess. As usual, it’s Activision’s fault.

The Call of Duty franchise has a ridiculous amount of DLC and trailers, none of which have “DLC” or “Trailer” in the title. So we get page after page of stuff like, “Call of Duty, Call of Duty Singleplayer, Call of Duty 2, Call of Duty 2 Singleplayer, Call of Duty: United Offensive, Call of Duty: United Offensive Singleplayer.” This shoots both “Call” and “Duty” to the top of the list. There are “only” about 12 CoD titles on the PC, but my list is showing nearly one hundred. And there’s no good way to filter this except to remove everything with “Call of Duty” in the title. And that’s fine. Without this one franchise the words “Call” and “Duty” aren’t common at all and shouldn’t appear on our list.

Other notable offenders are Company of Heroes and Total War, which pollute the list with a ridiculous flood of DLC. The last major offender is the Train Simulator games, which have a million little DLC packs that all have the word “class” in the title.

I filter all that crap out. We’re looking for overused words in game titles, not over-monetized games.

There’s one last round of filtering we need to do. We need to remove a lot of descriptive words: Gold, Steam, Online, Episode, and Game. Also sequel numbers and years. Those words are really common, but while “Dead Shooter” seems like it’s overusing the “dead” word, I don’t think anyone minds when an MMO is called “X Online” or when a re-release is called “Gold Edition”. And “Shoot Guy 2014” is arguably just as good a title as “Shoot Guy VII” and “Shoot Guy 7”. All we care about here is the “Shoot Guy” part, not the sequel identifier. These words are descriptive and helpful to the consumer and I don’t think it’s fair to count them as overused. They make game titles less confusing, while putting the word “Dead” in everything makes them more confusing.

So after filtering out as much noise as I can, here are the to 20 most overused words in game titles:

1. World – 129 titles
2. Dark – 120 titles
3. Star – 107 titles
4. Space – 98 titles
5. Quest – 89 titles
6. Battle – 89 titles
7. Dead – 79 titles
8. Magic – 78 titles
9. Black – 78 titles
10. Ghost – 76 titles
11. Wars – 72 titles
12. Simulator – 66 titles
13. City – 63 titles
14. Kings – 62 titles
15. Dungeon – 61 titles
16. Rise – 61 titles
17. Dragon – 57 titles
18. Deluxe – 56 titles
19. Maker – 54 titles
20. Evil – 54 titles

So that’s something of a surprise to me. I hadn’t ever noticed the overuse of “World” or “Space”. And “dead” – which I expected would be one of the big offenders on this list – is fighting for seventh place.

The list isn’t perfect. I think Crusader Kings DLC is propping up #14, and I noticed #20 counts all games with both “Evil” and “Devil”. “Ghosts” is propped up by Call of Duty: Ghosts and its endless flood of trailers and DLC. I could probably find other flaws in the list if I went digging for them, but this basically satisfied my curiosity. If you want to see it in more detail, here is the top 20 list, including the games.

 

Footnotes:

[1] Also, games with ‘half’ in the title are dead.

[2] …that you’re demanding something so unreasonable. What’s your problem, anyway? Sheesh!

[3] Is there a difference?



From The Archives:
 

114 thoughts on “Overused Words in Game Titles

  1. Wooji says:

    Time to make Dark Star the Battle Quest of Space World – Black Magic of the Dead edition

    1. Senjak says:

      I’d like to also suggest someone make “Dark-Star World: Quest to Battle the Dead Ghost Kings of the Black Magic Space Wars – Evil Deluxe Dragon Edition”.

      1. Ivan says:

        Obviously I wasn’t thinking big enough with my “World of Dark Star: Space Quest”.

        Also it turns out that this (http://en.wikipedia.org/wiki/Darkstar:_The_Interactive_Movie) is a thing.

        1. Grudgeal says:

          Dark Star *would* make for a fairly decent setting/inspiration for a Space Quest game.

          Well, for certain values of ‘fairly’

          1. evileeyore says:

            Also certain values of ‘decent’.

            Also I’m not sure I’d use the word ‘inspiration’ in conjunction with this project.

        2. Incunabulum says:

          I am so disappointed that it is not an interactive version of *this* movie.

          http://en.wikipedia.org/wiki/Dark_Star_%28film%29

      2. Ambitious Sloth says:

        I would play a game that held up to that title. But what is weird is that even though I know it’s just a hodgepodge of popular words, my brain still broke it down into a genre. Though it doesn’t even exist and I don’t even know what it could possibly be like to play.

        Sci-fi RPG. For those wondering.

        1. Neruz says:

          To be honest “Black Magic of the Dead” is a pretty metal subtitle, I would play a game that had that subtitle.

        2. Syal says:

          It would be Korriban from KOTOR.

        3. Sleeping Dragon says:

          The title is so out there that it definitely sounds like a conscious use and it’s been done a lot in recent years (a few off the top of my head Deep Dungeons of Doom, The Mighty Quest for Epic Loot, Holy Avatar vs the Maidens of the Dead). It’s interesting to see this namig device still works as I do feel it draws my attention.

        4. poiumty says:

          “World of Dark Star: Space Quest” was what I had in mind too. The funny thing is it actually sounds like a very plausible mobile game.

    2. Decius says:

      Star World: Dark Space: Black Magic Battle Quest.

    3. Paul Spooner says:

      “Dark Space World Star” sounds pretty cool, when de-contextualized from existing titles.

  2. Daemian Lucifer says:

    Why didnt you just use wikipedia?It has plenty of lists.

    For example,there is a list of games using steam authentication.
    And more importantly,a bunch of lists of all the games made since forever.

    1. Veylon says:

      I can’t help that notice that “Lists of Lists” – of which the lists of video games lists is one – is a disturbingly large category. I can only hope that it doesn’t grow so large as to itself need to be subdivided…

      1. Daemian Lucifer says:

        Listception!

        But,to be fair,lots of those lists overlap.What you need is just the list of windows games(because pc master race),and then just go through all the alphabetical lists and gather them together.

      2. Nixitur says:

        There’s also the List of lists of lists.
        Which, of course, includes itself.
        I think they should create a list of lists that don’t include themselves.

          1. Nixitur says:

            Yes, thanks for explaining the joke.

        1. Matt` says:

          For more fun, make a list of all the lists that do include themselves.

          It’ll start an edit war over whether the list should be included in itself, and whoever edited last will be right every time!

    2. Wide And Nerdy says:

      The one thing I can say in defense of the Steam approach is the Wikipedia approach (depending on how the list is constructed) could be biased based on what is notable enough for people to think to include. A Steam list, if its done by database query, lists the games people might not think of. Of course, it has its own biases based on the particulars of the demographic that uses Steam (and the types of developers that would use Steam. Indies and the triple AAAs that can afford multiplatform development).

      It would be interesting to see what the list looks like for Google Play or Apple Store. I suspect it would probably be heavily influenced by copycats trying to cash in on the more successful mobile games.

  3. DrMcCoy says:

    But apparently the paranoid people at Valve don't want to allow open database access to all the random strangers on the internet?

    Are you sure? The people at SteamDB suggest that:

    All of the basic application and package information we provide (unless noted otherwise) is from Steam itself, and can be acquired by anyone with a regular Steam account

    They’re using SteamKit2, a .NET library accessing the Steam API.

  4. bucaneer says:

    I happen to have some experience acquiring data from Steam so here’s my crack at the problem.

    1. go to Steam search and set up a filter to show only “Games” without DLC and other crap
    2. we want to grab data from all pages. Unfortunetely, the search results page handles page flipping through Javascript, so we can’t just grab page sources with curl – instead we need something like this script which returns page source after JS has done its magic
    3. run this in a Bash terminal:
    for i in $(seq 1 184); do curl-phantom.js “$(echo “http://store.steampowered.com/search/?snr=1_4_4__12&term=#sort_by=Name_ASC&category1=998&page=$i”)” | grep ” | sed -r ‘s|.*>([^<]+)<.*|1|'; done (nevermind, the comment form mangles it badly)
    4. now we have a full list of game titles
    5. some more rounds of sed, sort and uniq to get a list of most frequently used words (less sanitized than Shamus’ by choice):

    190 Edition
    99 War
    64 World
    63 Game
    55 Gold
    47 Wars
    47 Simulator
    46 Space
    46 Heroes
    46 Dark
    44 Deluxe
    37 Star
    37 Dead
    34 Battle
    33 Super
    33 Quest
    29 Life
    28 Lost
    28 Adventure
    27 Time

    1. braincraft says:

      So World War Simulator: Gold Edition and Heroes of Dark Space Deluxe are both more played out than Super Dead Battle Quest?

      1. Paul Spooner says:

        Interestingly, if you do a search within Shamus’ list, there isn’t very much overlap between the most-used titles. There’s no “Dark World” or “Space Magic” or things like that.

    2. Mephane says:

      we want to grab data from all pages. Unfortunetely, the search results page handles page flipping through Javascript, so we can't just grab page sources with curl ““ instead we need something like this script which returns page source after JS has done its magic

      Almost all sites that use JavaScript have some form of pure HTML fallback solution when accessed by a browser with JS disabled. I just tested this myself on the steam page you linked to, back/forward and page numbers are just regular hyperlinks.

      1. bucaneer says:

        Right, I missed that. In this case regular curl can do the job after all. However, there are pages on Steam that absolutely do require JS, such as the community market listings page, which is why I made the leap in the first place.

    3. Abnaxis says:

      This is where python would be handy. Really, “war” and “wars” should both counted the same, but that means stemming the results before counting. I don’t know if it’s easy with a bash script, but I’ve made a web-spider word-counter for a sociology paper about FOSS web pages before, and stemming was really easy there…

      1. bucaneer says:

        After de-pluralizing, lowercasing (which makes little difference) and removing “edition”, “game”, “gold” and “online” the top looks like this:

        148 war
        75 world
        56 hero
        49 dark
        48 star
        47 simulator
        46 space
        44 deluxe
        38 legend
        38 dead
        35 dungeon
        34 king
        34 battle
        33 super
        33 shadow
        33 quest
        30 lost
        29 life
        29 adventure
        27 time

        1. Paul Spooner says:

          “World War Hero: Dark Star Simulator” 10/10 GOTY

          1. evileeyore says:

            And it’s first “Deluxe edition” with DLC: Space Deluxe w/ Legend of the Dead Dungeon DLC included.

  5. Liam says:

    More importantly, how did she get so good at writing backwards?

    1. Exasperation says:

      By climbing a trellis in the shape of a klein bottle?

      1. Crespyl says:

        “What is this, a whiteboard for ants!?”

        1. MrGuy says:

          The real whiteboard needs to be at least three times as backwards as this.

    2. Cerapa says:

      It actually isn’t that hard to write backwards if you spend a sufficient amount of time doing it. At the start you do need to check the way you write things a lot, so you can repeat the same things in reverse, but it turns automatic pretty quickly. It also isn’t a skill that fades. You pretty much just need to learn to do it once.

      Yes, I do get very bored in class, thanks for asking.

      And my handwriting is actually better when I write backwards. I blame it on the fact that the world is filled with right-handed assholes, who miss the obvious benefits of writing from right to left.

      1. Nixitur says:

        Yeah, I learned that as well when I got bored in class. That was a few years ago, but I can still do it.
        Strangely, I find longhand to be far easier than normal non-connected writing to write backwards. It just flows together and I don’t have to constantly think about which direction this particular letter goes in.

      2. evileeyore says:

        Also, simply practice writing with your off-hand. Most people will naively write backwards with that hand, so once you get penmanship to an exceptable level you write with either hand.

        Bonus: You’ll learn to read backwards as well.

  6. Robyrt says:

    Yeah, Crusader Kings has dozens of $2 cosmetic DLC packs, which are routinely sold as a huge bundle for $10 at Steam sales. They also have a bunch of legitimate expansion packs with features like “Map now 30% larger, includes India”, so it’s difficult to weed out the false positives properly.

    1. Matt Downie says:

      You actually get the larger map for free in a patch. The DLC only allows you to play as an Indian ruler.

  7. MichaelGC says:

    Looks like Tom Clancy’s Ghost may have a spectral thumb on the scales! :D Some of those titles sound rather intriguing. I wonder what goes on in Happy Wars? They have Fast Food Clerics, apparently! Able to channel the healing power of tacos and such, one assumes.

    I wonder if the “AAA” list and “indie” list would be markedly different? (If it were even possible to clearly delineate what counts as which, of course.)

    1. Joe Informatico says:

      I wonder that too. “Space”, “World”, “Battle”, and “Quest” I associate with older games (mostly pre-2000s, or maybe pre-2006) and a lot of indies seem to be targeting those retro niches underserved by AAA.

  8. Feriority says:

    As far as perl – it’s hard to tell something you typed with your face from valid perl, but you can write good perl that doesn’t look like something you typed with your face! use strict; and use warnings; help a lot (they tell perl to be more strict about what it considers valid and emit warnings when you input something technically valid but almost certainly not what you want, respectively). Beyond that, it mostly comes down to good code standards (not allowing the crazy facesmash variables except in cases where they have explicit, obvious meanings, like @_ for arguments and $1 in regex matches).

    It’s still very easy to end up writing facesmash perl (which is my new favorite name for it, I called it line noise before), which is why I prefer python for most scripting stuff, but when you’re just trying to do something with a giant pile of text, perl’s built-in handling for text is hard to beat.

    1. Neko says:

      Indeed. One of the reasons I like Perl so much is because it is such an expressive language that if you really want to facecode, you can (but you’ll hate yourself in the morning). If one is so inclined, you can also code in a more Modern Perl style.

      I’m actually not a fan of python; its philosophy is “there is exactly one way to do it” and attempts to get everyone to write clean “pythonic” code in that paradigm, but it is entirely possible to write terrible code in any language regardless of forced whitespace. Also it doesn’t have lexicals! (or, technically I think there was progress made towards that in python 3, with a weird ‘nonlocal’ keyword, but that doesn’t help avoid the sort of bugs I would hit if I just had a ‘my’ or ‘var’ keyword for every declaration…)

      Edit: Oh, and in regards to use of $1 for regexp matches, these days I avoid using them for anything but the most trivial of regexps; use the named captures introduced in 5.10, ideally in combination with /x for whitespace-heavy pretty-printed patterns. (?<areacode> ⧵d{2,4} )(?<phone> ⧵d{8} ) for example, captures to $+{areacode} and $+{phone}. Much less room for mistakes.

      Edit Edit: Had to use ⧵ (U+29F5) instead of backslash to avoid the obvious in hindsight comment munging designed to prevent nasty hackers from persuading PHP to do stupid stuff.

      1. Abnaxis says:

        I find it funny that whatever unicode escape you used is not supported by whatever character set this random laptop I’m viewing it from has installed. I just see a bunch of boxes.

  9. Falling says:

    TOR did something similar with Sci-Fi and Fantasy book titles. Theirs was limited to titles from the previous decade. The post was on 2011, so I assume it was 2000-2010

    http://www.tor.com/blogs/2011/03/best-of-the-decade-data-common-words-in-titles

    Their Top 10 was
    Shadow
    Dragon
    War
    Night
    Dead
    City
    Dark
    Blood
    Magic
    World

    1. sheer_falacy says:

      And John Scalzi went the extra mile and made http://www.tor.com/stories/2011/04/the-shadow-war-of-the-night-dragons-book-one-the-dead-city-excerpt, which got a Hugo nomination. The author’s response was, and I quote, “AH HA HA HA HA HAH HA HA HA HAH HA HA.”

      1. Alexander The 1st says:

        I don’t know; I would’ve given him a Hugo nomination for “Although that's not actually a legend. That's really just more of an ambition.”. That was pretty genius delivery.

        1. Cuthalion says:

          +1 to these comments and that story

        2. swenson says:

          I saw it coming and it was still beautiful.

      2. 4th Dimension says:

        Ah Scalzy you magnificent bastard :)

      3. Falling says:

        Is that a real book? I remember coming across that post, but I assumed it was just a joke because he posted on April 1.

    2. Thomas says:

      It’s interesting just large the crossover is between the two lists.

      1. swenson says:

        Well, a ton of videogames are fantasy or sci-fi, that probably accounts for a lot of it.

  10. Brian says:

    I’d almost be more interested in a bigram analysis. You can use semantic vectors *really* trivially for this job on your current data. One of my students and I then turned that into a quite lovely word cloud (take a look at the appendices in her thesis near the end) using wordcram which allows you to shape the wordcloud to a 2 bit background image. (I.e. you could fit the bigrams on a posterised call of duty cover, for maximum irony.)

    1. Feriority says:

      Even better, you could use bigrams to create game-naming markov chains and auto-generate some excellently generic video game titles.

      1. MrGuy says:

        I love a good markov babbler as much as the next person, but I don’t think it would work terribly well for videogame titles, mostly because most videogame titles are so short. Four words is a fairly long game title…

        1. MrGuy says:

          Hmm…and to test my own theory, I fed Shamus’ list text (unedited) into a markov babbler I found online. Below are the first ~30 titles it spit out.

          I’m surprised that most of these are actually fairly short, though obviously you have leakers like “Endless Space Marine…” (which would IMO have been a great title if it had ended after 3 words…)

          There’s some biases towards reality (“Battlefield 2” and “Dead Island” are both here), which you’d expect with a Markov simulation.

          Personal favorites:

          Elegy for a Digger Simulator
          Doorways: The Deadly Intent
          Dead Island Thunder
          Empires: Dawn of Islam
          and, of course, the classic broswimmer Call of Diving

          Top 30 of my list:

          Ghost Recon Phantoms – Legend Guide
          Avadon: The Outcasts, Space Rangers
          Space Program Manager
          Fractured Space
          Endless Space Marine – 62 titles A Dead – Army of the Rain-Slick Precipice of Loath Nolder
          Necronomicon: The Ripper
          Call of Diving
          Elegy For A Digger Simulator
          Agricultural Simulator 2
          PAM Space QA
          Kerbal Space 2
          Battlefield 2
          Battlefield Bad Company 2 – The Secret of Ashworld
          Doorways: The Deadly Intent
          Thief: Deadly Intent
          Thief: Deadly 30
          Dead Island
          Dead Island Thunder
          Tom Clancy’s Splinter Cell Blacklist Deluxe Scenario
          The Dragon Helm
          Spiral Knights: Iron King
          Castlevania: Lords of Heroes: Going Rogue
          City of Duty: Ghosts – Libro Mission
          Construction Simulator 2011: Extended Edition
          Blackguards Deluxe Edition
          Empires: Dawn of Islam
          Crusader Kings II: The Mighty Quest for Epic Quest 2
          Commander: Conquest of Magic VI
          Magicka: Wizard Wars

  11. Noah Gibbs says:

    If you ever decide to take another crack at it… Here, you’re actually measuring the most *used* words in game titles. To get the most overused, grab a word frequency list from somewhere and compare.

    Common choices of word frequency lists include things like the complete works of Shakespeare, which may be slightly odd for this use or may be fine.

    Then, divide each prevalence in game titles ratio with the prevalence in English. That should make words like “Black” and “City” drop a bit — they’re pretty common in English, too. But it will make words like “Ghost”, “Dragon” and “Quest” pop up toward the top — they’re not as common in English, but crazy-common in game titles.

    Though you’d want to filter to words that have reasonable usage levels in both game titles and English. Otherwise you’d get a bunch of stuff where if a word was used only once in all of Shakespeare but three times in game titles it’ll look crazy-overused.

    1. Daemian Lucifer says:

      I didnt expect the kind of literally statistician.

      1. Falling says:

        NOBODY expects the Literary Statistician!

        1. MrGuy says:

          I didn’t ask for the Literary Statistician!

    2. Paul Spooner says:

      Yes! This needs linguistic normalization.

    3. Mrcl Pfffr says:

      Is Shakespear’s work really representative for contemporary english?

  12. Chris says:

    I sense this post was written because I was complaining about Darkest Dungeon.

    1. Daemian Lucifer says:

      You should complain more then,make Shamoose stop his music thing and come back to the darkest depths of the dungeon of programming.

  13. Sean M. Paus says:

    There are some cool sounding games hidden in Shamus’s list. I myself can’t wait for Dark Star World, or Ghost Wars Simulator.

    1. Cuthalion says:

      I want to play Ghost Wars Simulator.

      1. Mephane says:

        I want to play Ghost Simulator. Haunt a huge old mansion, cause mayhem and scare the visitors to death.

        1. bigben1985 says:

          You mean like that?

  14. ben says:

    could you give us a list of actual game that use at least two item of your list? I don’t think it would be too long.

  15. guy says:

    I’m seeing some big false positive spikes on the list. Notably, Darksiders II dlc is holding Dark up there, Space Marine and Space Hulk dlc are inflating Space, and Quest is also counting Conquest.

    1. Joe Informatico says:

      Is “Darksiders” inflating the “dark” count? I was assuming the script only picks out complete words, i.e. bounded on each side by a non-alphabetic character (space, punctuation, whatever), otherwise “Kings” wouldn’t be in the Top 20 while “King” wasn’t.

      1. guy says:

        The script appears to pick out words that are part of larger words, though I’m guessing Shamus made a specific exception for Kings. Darksiders and Underworld trigger Dark and World respectively on the list.

        1. WJS says:

          It’s a tricky problem; I would argue that while plurals should obviously fold together, and clearly “quest” and “conquest” should not, you arguably do want to count word fragments, so “dark” and “darksiders”, or “craft”, “warcraft” and “minecraft” should count together. This is obviously an order of magnitude more complex.

  16. kikito says:

    If there’s a way to get sales data per game, you could get a list of the top-selling words to have in a game title, too.

    1. Paul Spooner says:

      “Craft” is in both WOW and MC. I’d bet it comes out in the top five.

  17. Blake says:

    Protip:
    A base game on steam will always have an appId that ends in 0 (the inverse is not true, if something has 10 or more DLC packs then some of them may also end in 0).
    This means you can discard anything with an appId that doesn’t end in 0.
    Of course this is probably useless knowledge now as bucaneer seems to have solved it.

  18. gresman says:

    In advance I want to say that I am a bit tired and did not have the had find something the would keep the names of games together to get a nice list for reference. maybe I will think about it tomorrow.

    My suggestion for removing some of the false positives would be the following:
    1 ToLower everything
    2. Separate by common separators (;, :. , -)
    3. Put everything in a list/dictionary/array
    4. Clean out all the separator entries and some weird numbering
    5. Go through the list and count each word in it
    6. Wirte that in your output array

    I know the O notation for that thing is horrid but it should do the job. Maybe there is room for more startup cleanup like ignoring sequels alltogether.

    Sounds like an interesting project. Have to think about that. :)
    Good night to everyone!

  19. Rick says:

    I didn’t see “World” being on the list at all, but there it is. Out of nowhere, right under our noses.

    You did pretty well leveraging PHP’s string functions, but it looks like it messed up a little on the full list page… “Tomb Raider: Underworld” (all mentions) became “Tomb Raider: Undeworld”.

  20. Sabrdance (MatthewH) says:

    For my own amusement -here’s how I’d have tackled the program just using a basic stats program (STATA if I were at work, SPSS if I were desperate.)

    The best solution would be for the person who created the database to give it a meaningful ID code so we could sort based on that, rather than having to futz with text. (first 5 numbers ID publisher, next 5 ID series, next 5 ID entry in series -do games have something like Dewey Decimal IDs?)

    But if we must futz with text, I’d think the best solution would be to extract a string from the titles. If there’s a standard format, you could, say, pull everything before the first semi-colon, or the first 10 characters. Then search for duplicates and drop them. that should reduce the number of cases you’d have to manually check.

    If I could pull up some even more esoteric software, there are programs that can identify roots from a string of text and extract them -which would fix needing either an arbitrary number of characters or a standard format.

    1. Peter H. Coffin says:

      Key IDs should never be meaningful. Bad habit to let happen. Probably won’t matter for a throwaway database, but for anything you expect to ever change…

      1. Abnaxis says:

        Given that this person is using Stata (and from my own experience) I’m guessing they work more in data analysis than in data storage or database administration.

        Keeping away from meaningful IDs is all well and good when you’re designing your own database for a web page from scratch, but when you’re trying to merge the data from 3 different Census samples for a nested multi-level analysis with hundreds of thousands of samples, you thank the goddamn bloody stars if the data sets used consistent key IDs, or you curse the children of the creators if you have to make them yourself.

  21. While it seems to only appear in four titles on Steam, I’ve always found the word “unleashed” to be over-used. This mostly comes (I suppose) from my years of reading comic books. I think every major comic book character has been “unleashed” on a cover somewhere, with Wolverine experiencing an “unleashed” at least annually.

    1. Mike S. says:

      “When Titans Clash”

      1. And the other favorite: “Everybody DIES!”

        1. Dev Null says:

          Sequel to “Rocks Fall”, as I recall…

    2. Purple Library Guy says:

      Makes you wonder about who’s keeping all those super heroes on a leash the rest of the time.

      1. Stan Lee is a dirty old man, that’s for sure.

  22. Humanoid says:

    I would like to institute a general moratorium against the use of the word ‘of’ in game titles, or at least specifically titles in the form of X of Y.

    1. Daemian Lucifer says:

      Same should go for the HURK sign.

  23. Galad says:

    While I read that article with a content smile on my face, I don’t have any meaningful feedback, other than:

    I feel like the picture is some Left for Dead reference, but I don’t get it? Would anyone please explain it?

    1. CJ Kerr says:

      It’s the second hit on Google images for “left 4 dead 2”. I presume it’s the header because the first paragraph is about games with the word “dead” in the title.

  24. Steven Taylor says:

    IAMA electrician. This screwdriver is always the screwdriver I need;

    http://www.amazon.com/Klein-32500-Screwdriver-Driver-Cushion/dp/B0015SBILG/ref=sr_1_1?ie=UTF8&qid=1424083404&sr=8-1&keywords=klein+9+in+1

    Unless of course I need a swivel screwdriver, a ratchet screwdriver, a precision screwdriver, or some kind of, torx or larger nut-driver bit.
    Or if I actually need a screw gun, because you don’t want to be there all day, do you?
    But apart from these special cases, and also times where I’m dealing with metric-sized hex-heads, or while I’m working a hot panel and am using my insulated screwdrivers instead, I’m using the Klein 9-in-1.

  25. Syal says:

    No idea how complicated it would be, but if you could get a filter to eliminate repeating two-word phrases (so Call of Duty counts for “call” and “duty” but Call of Duty 2 doesn’t count for anything) it would weed out all those DLCs and sequels. Not like any two-word phrases come up in unrelated game titles.

    …I’ve got to say I’m surprised Shadow didn’t make the cut.

    1. Mortuorum says:

      In all fairness, I believe that there are too many games named “Call of Duty”, so maybe it should count.

  26. Vorpal Smilodon says:

    I should make a No Man’s Sky style game called Dark Space: Star World Quest – you’d be traveling through the darkness of space to find the mythical Star World, of course. On a quest, you might say.

  27. Abnaxis says:

    This reminds of a FUN (in the Dwarf Fortress sense of the word) project I had.

    Take 2 excel sheets (because of course we couldn’t use database files) filled with student information–one for the main college applications, and one for need-based funding applications–and merge the two, without any sort of key variable*. Oh and by the way, the same person is quite often in each data set multiple times, because the student, their parents, and their academic advisors all might have made separate applications, none of which were constrained to actually fill in the information in the same manner (there were quite a few “Bob” versus “Robert” differences). This isn’t a small set of data either–on the order of fifty-thousand records, and if you mess one up you just potentially screwed some disadvantaged kid out of their chance to go to college.

    What I finally wound up with was a script that looked at name (first and last), address, and application ID**; merged the stuff that matched between the sets; and flagged all the close matches for a human to look through and deal with. Which sounds simple but was a BITCH to iron out.

    I have no idea how they do it now, because I know they aren’t using my code. Apparently, the intern before me did all of this by hand. Weekly.

    *:In the years before I got this mess, they used SSN as a key variable. Then they stopped collecting them for privacy concerns (reasonable) without establishing any manner of variable that could be used as a key instead (less reasonable).

    **:Supposedly the students are assigned a unique application ID when they filled out the main admittance application, which is supposedly required before the student can fill out a funding application. Of course, whoever created the form didn’t deign to validate the admittance application number on the funding application, so it never (less than 10% of the time) had a valid number in it. For some reason, most students thought their email went in there (I’m guessing because that’s their log-in name, but I don’t know that for sure)

    1. WJS says:

      I would guess whatever genius designed the form thought “ID” would make a good label.

  28. Shamus if you revisit this later you might want to make sure words are actually words.

    Darksiders for example is counted as “Dark” which is wrong, here the word is “Darksiders”.

    You should use space (aka whitespace) as a word delimiter there is a pattern switch for that in PHP (sorry can’t recall what it was) it’s listed in the PHP documentation.

    By the looks of it Dark and Star get artificially a high count as there are a lot of words with “Dark” or “Star” in them.

    Then there is the question of EverQuest vs Ever-Quest vs Ever Quest.
    IMO all those three are the same, the capitalization implies a dash thus two words, while if it was Everquest then it’s a single word.

    Not sure how to do a pattern/rule to handle that though. You could probably do a [A-Z] rule combined with a starting with rule or sub group using () or something.

  29. Patrick the Evil Space Ghost Mullet King says:

    This is also the exact same list for heavy metal songs written in the last 40 years.

  30. nm says:

    Regarding database schema lady, password should be the (hash, salt) tuple returned by a decent password hasher like bcrypt.

  31. Patrick the Evil Space Ghost Mullet King says:

    This would also be the list presented to a marketing team when selecting the next flavor of Mountain Dew.

  32. Duoae says:

    Hah!

    This post is excellently timed after the third or fourth time I wandered into a Dead State thread thinking it was about State of Decay!

  33. Volfram says:

    I thought it would be fun to mess around with this list, so I took the top 10 words and wrote a small program to spit out pairs of them.

    Then I spit out every pair with minor criteria to reverse them in cases where that seemed to make sense.

    Here’s a list of 58 hit titles for your next video game.(You’ll note most of these are already taken…)

    1: Dark World
    2: Star World
    3: Space World
    4: Quest World
    5: Battle World
    6: Dead World
    7: Magic World
    8: Black World
    9: Ghost World
    10: Star Dark
    11: Dark Star
    12: Space Dark
    13: Dark Space
    14: Quest Dark
    15: Dark Quest
    16: Battle Dark
    17: Dead Dark
    18: Dark Dead
    19: Magic Dark
    20: Dark Magic
    21: Black Dark
    22: Ghost Dark
    23: Dark Ghost
    24: Space Star
    25: Quest Star
    26: Battle Star
    27: Dead Star
    28: Magic Star
    29: Black Star
    30: Ghost Star
    31: Quest Space (Suspiciously similar to a hit series from the 1990s. Also one of the few cases where my “Try swapping the words” algorithm missed.)
    32: Battle Space
    33: Dead Space
    34: Magic Space
    35: Black Space
    36: Ghost Space
    37: Battle Quest
    38: Dead Quest
    39: Magic Quest
    40: Black Quest
    41: Ghost Quest
    42: Dead Battle
    43: Battle Dead
    44: Magic Battle
    45: Battle Magic
    46: Black Battle (I’d skip this one, it sounds racist.)
    47: Battle Black
    48: Ghost Battle
    49: Battle Ghost
    50: Magic Dead
    51: Dead Magic
    52: Black Dead
    53: Ghost Dead
    54: Dead Ghost
    55: Black Magic
    56: Ghost Magic
    57: Ghost Black
    58: Black Ghost

  34. Duffy says:

    That table design in the picture is a little bit weird. If you want to allow for multiple products in an Order you would need to place another table in-between Orders and Products that links Orders to products in a One to Many relationship. Thus a User can be linked to many Orders and each Order can be linked to many Products. Database design is important! Especially if the front end shoehorned multiple product orders in despite the database not supporting them directly. Imagine having to unravel this thing into a proper schema after it’s been used for 5+ years. Blah.

    Oh yea, cool article. Didn’t expect World to be winning, but kinda makes sense.

  35. allfreight says:

    Shamus, please spend a day learning python. You will be so much happier.

  36. Phantos says:

    I can tolerate a lot of weird stuff in titles, but I think I’m officially sick of seeing the word “Rise”.

    I don’t know if it’s just lazy, or if I’m just having war flashbacks to The Dark Knight Rises.

  37. N/A says:

    That woman is awesome! she can write backwards specially technical material!

  38. WJS says:

    Hmm, that seems like rather a hacky way to filter out DLC and the like; I’d probably try poking the Steam API a bit more, see if there’s a page you can plug in an appid and get some per-item data and constrain the list to actual releases that way.

  39. Rick says:

    Poor little DLC Quest didn’t stand a chance.

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

Leave a Reply to Joe Informatico Cancel reply

Your email address will not be published.