About the Author
Mass Effect
Final Fantasy X
Batman:Arkham City
Borderlands Series
Weekly Column
Champions Online
World of Warcraft
DM of the Rings
Good Robot
Project Frontier
Forums
"Music"



Unity Week #9: Preemptive Premature Optimization

By Shamus
on Tuesday May 29, 2018
Filed under:
Programming

 
 

“Premature optimization is the root of all evil.”

That quote by Donald Knuth gets repeated a lot in programming circles. It dates back to 1974, which means that by the standards of computer science it’s more or less the Code of Hammurabi. While I freely admit that Knuth is a better computer scientist when he’s fast asleep than I am at the peak of a caffeine high, I do have some reservations about this particular bit of wisdom. Specifically: Is premature optimization really that dangerous, and what’s the difference between “premature optimizing” and simply building something properly the first time?

In my professional career, I can’t remember a single instance where premature optimization caused a serious disruption. Maybe that’s a side-effect of the domain I work in. Games have very strict performance requirements that would seem completely unreasonable – bordering on fanatical – to someone working on (say) database administration. The performance of your program is continuously expressed to the user through the framerate of your full-screen application. If a thread stalls, if a frame is dropped, if texture data isn’t fetched in time, if you run out of VRAM, if you oversaturate the memory bus, or if one of a dozen or so other systems falls behind, then it will create problems that a human being can see and feel.

Contrast this with something like a Pixar-style render farm that chugs away rendering gigantic images all day long. Both the game and the Pixar-renderer are rendering images. Time is money so there’s a huge financial incentive to get the Pixar program to run as fast as possible, but if there was a bug that wasted 15% of the available processing power, how long would it take someone to notice? Would they? Meanwhile in a game, dropped or late frames give the user a realtime visualization of your sins. It’s hard for the user to not notice performance problems.

I’ve spent a lot of time in this domain that’s obsessed with performance and maybe a little negligent when it comes to maintaining code, and perhaps that time has blinded me to the wisdom of Knuth’s words, but what’s the actual damage of premature optimization? I get that a programmer might blow a bunch of time squeezing an extra 2% performance out of some system that ultimately doesn’t matter, but aside from the squandered programmer hours I don’t see how it hurts a project. And if we’re worried about wasted programmer hours then we have bigger fish to fry than this. Unreadable code, dependency hell, lack of documentation, failure to abide by best coding practices – all of these have devoured far more of my hours than someone’s mis-spent weekend tinkering with trivial concerns. I can believe that premature optimization is a bad thing, I just can’t get behind the notion that it poses a serious threat to our productivity. At least the premature optimizer is wasting their own time instead of mine.

Sir, your 4Kb RAM upgrade just arrived. The forklift is bringing it up now.

Sir, your 4Kb RAM upgrade just arrived. The forklift is bringing it up now.

I understand things were different in 1974. I didn’t do much programming back then on account of being only three years old. Processor cycles were literally thousands of of times more precious than they are today, and computer memory was more scarce by a factor of millions. I imagine those sorts of extreme limitations might lead the odd programmer to fritter away several hours of productivity worrying about individual bits, and maybe the practice was common enough for Knuth to name it as an “evil”, but during my career I’ve seen more productivity lost to pointless arguments over brace styles than to misguided efforts at optimization.

But maybe I’ve been lucky. Maybe premature optimization is still a scourge industry-wide, even if it’s never afflicted any of my colleagues. Maybe I’m just oblivious to it. Maybe I’m actually a terrible offender in this regard and none of my colleagues ever had the heart to tell me. Like I said earlier, I’m not totally clear on where you’d draw the line between “optimizing” and simply “building it right the first time”.

Perhaps an Example Would be Useful

I wonder if Tychus Findlay was a fan of Warhammer 40k.

I wonder if Tychus Findlay was a fan of Warhammer 40k.

Let’s say we’ve got a game where the level is a tidy grid of rooms. Yes, that’s a very boring design. But it’s easy to visualize and that’s what’s important for this discussion. You’ve got an N×N grid, where every room occupies one space on the grid. This means the whole thing can be stored in an easy-peasy 2D array. If you want to know about the room position 3×4, then you can just look it up like so:

//In c sharp...
Room r = my_rooms[3,4];
 
//In C++...
Room* r = my_rooms[3][4];

(Having used C sharp for a few weeks now, I have to say I prefer the CS approach to expressing array indices. [3,4] is a lot less annoying to type than [3][4], and it’s shorter by one character. And even though the C++ way is far more familiar to me, I still find the C# syntax to be more readable.)

Whatever. Simple. We just fill the array when the program starts and then we never have to worry about it again.

The problem with this design is that you have all the rooms in memory at once. That’s fine if the grid is small, but if this is an open-world kinda deal with miles of dungeonMILES of same-sized rooms? Now the game is starting to sound REALLY boring. then this is going to be a huge waste. The game is probably only interested in the rooms directly surrounding the player, and if you’ve got tens of thousands of them in memory then that’s going to incur a massive cost. It will devour memory, and the loading screen will be intolerable.

The other design you could use is to have rooms dynamically allocated as needed. Instead of creating all the rooms at startup and storing them in a tidy grid, you create them as the player gets close to them and you clear them out of memory if the player leaves the area.

The problem is that this is a monumentally more complex designEr, it’s not actually THAT complex. But it’s complex compared to the array., and that complexity will have far-reaching consequences in your code.

Now you need a background thread to load in rooms on demand, and free them up when no longer needed. Are those rooms filled with other resources like enemies, game items, or textures? Are those things used elsewhere? Can we get rid of them? We’ll need a system for tracking all of that stuff.

The pathfinding system will need to be updated. If the pathing routine is trying to work out a route from A to B, what happens if its attempted route takes it from a loaded room to an unloaded one? Should it give up? Should it signal the room-caching thread that it needs this room and then wait for it to become available? It depends on the game.

Physics needs to be able to handle having things vanish from the scene. An object might be on the threshold between rooms, and then one of those rooms gets unloaded as the player moves away. We need to make sure the physics engine isn’t going to freak out when this happens. If physics object A is resting on top of physics object B, but both of them belong to different rooms, then how do we handle if one of them goes missing? If B simply vanishes, then A will fall down to the floor. But then if the player returns and B reappears, A and B will both occupy the same space and the physics system will probably throw a fit over that. If you’re very lucky you won’t lose three hours trying to figure out why 1 in 100 NPCs are randomly killed by props travelling at mach 2.

Bill Gates never actually said that "640k should be enough for anybody", and if you look at the software his company designed it is pretty clear he had a slippery grip on the concept of finite memory.

Bill Gates never actually said that "640k should be enough for anybody", and if you look at the software his company designed it is pretty clear he had a slippery grip on the concept of finite memory.

We’ll need to synchronize these changes to the world state with the changes to the player, which might create strange, game-breaking bugs. Let’s say the player opens a treasure chest, removes the treasure, and walks away so the treasure room de-spawns. Obviously we need to save the state of the now-empty treasure room to disk, or we’ll create an exploit. But let’s say the player loots the chest, runs away, the room despawns, and then they notice their boss is approaching their cubicle so they exit the game without saving via Alt-F4Because not honoring Alt-F4 is inherently a terrible design.. A few minutes later they launch the game. The treasure room was saved to disk, so it remains empty. But the player exited the game without saving the state of their character, so they no longer have that treasure in their inventory. The treasure has vanished forever. This may be annoying or it might be game-breakingYou lost the key that exits the dungeon? Oops!, depending on the design.

There will be similar concerns with many of the other systems: Audio, rendering, triggers, lights, particles, and countless other systems will need to be designed in such a way that they can operate gracefully when objects appear and vanish without warning from one frame to the next. (Or they need to communicate with some sort of management system that’s in charge of loading and unloading. It’s more work either way.)

None of these problems are insurmountable. In fact, these problems should all have straightforward solutions. The point I’m making is that this is a lot of work compared to the naive approach of tossing everything into an ever-present array.

Note for the non-coders: From here on I’m going to refer to this other approach as using a “hashtable”. I’m not going to stop and explain what a hashtable is right now, although if you’re really curious you can read about it on your own. There are a lot of ways to store data and each of them has different strengths and drawbacks, but to keep things simple we’re just talking about two:

1) Putting stuff into an ever-present array, which is simple but can be wasteful at higher scales.
2) Putting stuff into a hashtable, which is much more efficient but a lot more trouble to implement. (The hashtable itself is trivial to set up, but making all the other systems operate in a world without guaranteed data availability is non-trivial.)

For optimal hard drive performance, take the cover off once a month and oil the disks. It'll keep them spinning smoothly!

For optimal hard drive performance, take the cover off once a month and oil the disks. It'll keep them spinning smoothly!

The thing is, you really do want to pick one and stick with it. It’s far easier to begin with a hashtable than to get weeks into a project and then try to retrofit your design into a hashtable. Retrofitting later will mean making changes to all those other subsystems: AI, pathfinding, physics, sounds, etc. Going back to old code to make changes is far more time consuming than making new code. Once the pathfinding is working, I want to be able to treat it like a magic box and forget about it so I can spend my finite mental bandwidth on other tasks. I don’t want to have to come back three weeks later and answer the question of, “So what happens if a room vanishes from memory while I’m routing a path through it?”

So early on in a project I’ll come to some sort of crossroads. I’ll have to choose between an array or a hashtable. I don’t want to do the extra work of implementing a hashtable if I’m not going to need it, since it will add a bit of programming overhead to every single thing I add later. On the other hand, I really don’t want to start with an array and then have to switch later.

Gosh, it would be nice to know ahead of time if I’m going to need the hashtable. I should probably look at how many resources these rooms will take. How many can I reasonably fit into memory before I have to worry about them? How large will they be on disk? How much processing power will each of them consume per frame? How difficult is it to pull a room into memory when the game is running? I don’t need exact numbers, and in fact getting exact numbers at this stage is impossible because a lot of systems haven’t been built yet. I just need a ballpark estimate so I have basic awareness of the tradeoffs I’m dealing with.

So I start asking these questions about memory usage and processing cost, trying to figure out what design I want to use. And inevitably another programmer will caution me against “early optimization”. “Whoa there Shamus, it’s way too early for you to be optimizing!”

Is this really “optimization”? And is it really “early”?

Is this how the other coders work these days? Just code under the assumption that you have infinite memory, CPU cycles, and storage space, and then re-engineer everything later once you start running to hard limits?

I dunno. I’m sure most programmers will concede that what I’m doing here is neither early or optimization, but just basic design work. The thing is, I’ve never figured out where you’re supposed to draw the line. When do you cross the magic threshold where it’s supposedly okay to worry about CPU cycles again? When does it stop being “early”?

Footnotes:

[1] MILES of same-sized rooms? Now the game is starting to sound REALLY boring.

[2] Er, it’s not actually THAT complex. But it’s complex compared to the array.

[3] Because not honoring Alt-F4 is inherently a terrible design.

[4] You lost the key that exits the dungeon? Oops!


 
 
Comments (107)

  1. kikito says:

    > When does it stop being “early”?

    I can try to answer that one.

    You start by picking a target system and testing scenario. Like “This complex scene with 4000 entities should run at 60FPS on a current PC”. Or “I should be able to draw and move 50 sprites on this cheap android phone at at least 30 FPS”.

    Then you do the simplest possible thing (in your late example, the array). Then you test it on your target system and scenario. If it works correctly, you continue with other thing. If it doesn’t pass the test, that’s the moment you start optimizing – and you start by firing up a profiler looking at your bottlenecks, not by “guessing” which change can make the thing run faster.

    One nice thing about having “performance tests” is that they are programmable: you can make an automated server which tests that performance automatically (possibly after every commit you push to the repo).

    • Echo Tango says:

      This, in addition to what CruelCow and thread say below, is pretty much what I was going to say. You need to *measure* how fast/slow, or big/small your system and subsystems are, not just guess. Hypothetically, you could spend a week optimizing the size of your world in memory, then your co-worker and/or boss comes in on Friday and shows you that the memory is being eaten up by the AI system and not the size of the objects in the world on disk. That would be a very awkward meeting.

    • Blake says:

      I dunno, in the games I’ve worked on, the target is the max we can get out of each console, we’d tried doing things like art budgets but as things changed those budgets would change.
      Like we need to change how a shader (or our entire lighting model) works, and suddenly we can have a lot more polygons but a lot less lights, or a lot less polygons but more instanced meshes.
      Getting games to run at 60fps is hard, so any per-frame stuff we can write cheaper tends to pay off. For a more concrete example, we used to write a lot more AI logic in Lua, but at the end of each project they’d always show up when profiling. Now if we know they might need to calculate anything slightly expensive, we write a job that can run while we’re rendering and just use the results in the AI controller.
      Some would call it premature because we haven’t yet measured (or even fully designed) the AI controller to know if it’s going to take 0.01ms or 0.4ms to run, but the more experienced of us go straight to the thing we know will be fast to save headaches later.

      It was way faster and cheaper to write things efficiently the first time than to do them twice.

  2. Abnaxis says:

    EVERY TIME SOMEONE WAVES ME OFF WITH “Premature Optimization” ACCUSATIONS I AM REFERRING THEM HERE.

  3. CruelCow says:

    The full quote makes it quite a bit clearer:

    The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.

    No, we don’t have infinite RAM or CPU etc. But we don’t have infinite programmer time either. Spending hours thinking about how to optimize a function is useless of the function only takes up .1% of the runtime – even if you magically got it to 0, nobody would notice and you wasted good programmer time that could have been spent optimizing stuff that actually takes up your resources.

    How do you find out which parts yield the most benefit from optimization? You run the program and measure. Now as you mention, some things are big changes that would mean a lot of work. Those you need to decide early, but usually you can guess from experience which way is better. To stick to your level example: you should know at the start of the project if you build an open world game or a corridor shooter.

    • CruelCow says:

      Maybe a concrete example explains it better.

      I once wrote a program that took input from a command line. I wanted to it to be fancy like git and understand shortened commands, if the desired command was still clear.

      Imagine my program knows the commands “submit”, “receive” and “record”. If the user types in “s”, the program knows that they mean submit, but if the user types in “re” the program lists receive and record as possibilites and asks the user to be more specific .

      Now I had this great idea on how to implement this in a performant way: I would create a tree, with every command as a path through, but with duplicates collapsed. So in our example the root node would have two edges – one with submit and one with rec. Submit would end in a leaf node and rec would split further into “eive” and “cord”. A great system! Super performant! Even if you added a thousand different commands, as long as they aren’t named nearly identically, the program would only need to traverse a fraction of the tree! Duplicate substrings are only stored once!

      Until I realized that it was all a waste. This program wouldn’t even have 100 commands. Even if selecting the right command took 10ms, it wouldn’t matter since it only happens after user input, once. I spent more time fixing one bug of that thing than I did rewriting it. The new version? Just a list of all commands, that I would traverse completely and compare to the input. Matches were added to a list. If the list had exactly 1 command at the end, we have a match. If more we have ambiguity. No fancy tricks at all: no sorted list, no binary search, no early stop.

      The new version was so fast the run time was within the measuring error. Not even close to 1ms. Any optimizations done to that code were a complete and utter waste. The dumb replacement was quicker to write and had equal costs: so little that it wasn’t worth spending a single minute thinking about. Yet I spent hours that I could have spent making the program better

    • MadTinkerer says:

      Another thing to consider is: You are a programmer, therefore you are working on a software product for a client running a mainframe*. Does the software currently produce results at a rate acceptable to the client?

      If no: fix it until the magic box does the right numbers acceptably quickly. (Double check they haven’t stored any of the magnetic reels near any magnets. 10% of all “bugs” can be fixed this way.)

      If yes: stop working on their software and work on another clients’ software.

      Rest assured none of the clients want to pay your extremely expensive salary for an additional two weeks making the software 5% faster than what they consider acceptable.

      *If this is not true, you must be using one of those futuristic mincomputers that only require a single forklift to move around. Our biggest clients want the biggest computers, so don’t concern yourself with the minicomputer fad.

      • Methermeneus says:

        This is why I always recommend my clients not upgrade. If you stick with good, reliable paper tape systems, you don’t need to worry about magnets, and if you have extra paper tape, you can repurpose it for your stock exchange ticker.

    • Dev Null says:

      Came by to say mostly what you said: it’s only _premature_ optimization if you optimize something that turned out to not need it, or to not improve your overall performance. Spending three days making a piece of your code run faster _that wasn’t the bottleneck_ means that those three days are purely wasted, unless and until you also optimize the bottleneck down to the point that your change made a difference. And if you’re going to do that, you really might as well _start_ by optimizing the bottleneck, and then re-evaluate.

      That said, it’s not always 100% clear where the bottleneck will be for all real world use cases. One is expected to make a certain amount of effort to design the original product to be reasonably efficient – that’s not optimization, that’s design.

      • Ingvar says:

        Not only were the three days wasted, but the code is now (probably) harder to read and maintain.

        While I don’t have any golden rule to specify what”premature” means here, my rough guideline is that I know it when I need it.

        I need to sort something? Fine, how many? Less than 100 things, I’ll just use whatever built-in the system comes with, without even checking what big-O it has, few enough that it ain’t gonna be a problem. More than 10 million? Now I know I have to consider writing something. More than a million million? I am pretty sure I have to write code.

        I need to store a game map? It’s 10×10 rooms, I’ll reach for an array. It’s something the size of GTA V? I know I will need to have something that fetches pieces dynamically off storage (or generates on the fly). Where the breaking point is, I don’t have a good intuitive feel for.

        I think the biggest danger with premature optimisation isn’t “you wasted time” but “you made things needlessly hard to maintain” (NOTE, may not actually be that relevant for game code, but is definitely relevant for code that is constantly evolved after initial deployment).

      • Erik says:

        Exactly this. Design precedes code; code precedes optimization.

        If you know your requirements beforehand, you can make reasonable decisions based on the requirements. (As mentioned above: is this open world, corridor shooter, or PvP arena? Each has different sizes and need for reloading/pathing/resource issues.) Those are decisions that drive implementations that cannot reasonably be switched between, and *have* to be done up front.

        The point of avoiding premature optimization is not that you shouldn’t consider the requirements before you design – that would be incredibly stupid. But once you design and begin to code, don’t just look at the code and think “Hey, I can do that more efficiently!” and spend a week tweaking around. Code it as simply and straight-forwardly as possible, in almost all cases. Get it to run. THEN profile it and optimize where it’s taking time.

        Tweaking code for efficiency makes the code much less clear and takes a lot of time. If that spot wasn’t actually your bottleneck, you’ve wasted time, made your code much harder to understand, and probably introduced a potential source of subtle and hard to debug issues. If you didn’t absolutely need to optimize, don’t.

    • Alan says:

      In an industry famous for months long crunch time, I think the cost in programmer time is critical.

      Bob wasting a week optimizing the main menu so it now does 200 FPS on the minimum spec machine instead of 150 FPS may be a week of work that gets dumped in your lap because Bob is backlogged.

    • Retsam says:

      To give even the fuller quote: (According to wikiquote)

      Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

      It’s not a blanket pass to be blatantly inefficient, but readability, debuggability, and maintainability should trump efficiency most of the time.

      And an important step of performance tuning, which is often overlooked, is to identify that critical 3% before you start to fiddle with optimizations: programmers often just try to optimize whatever code they happen to be looking at at the moment, without really looking at the bigger picture. Amdahl’s Law is important to keep in mind: the overall benefit you gain from optimizing a particular part of a program is proportional to how critical that part is to the overall execution time of the program.

  4. Lee says:

    Shamus, are you feeling all right? When When and that that in the same article?

    While I freely admit that Knuth is a better computer scientist when when he’s fast asleep than I am at the peak of a caffeine high

    I’ve spent a lot of time in this domain that that’s obsessed with performance

    For my taste, the line gets drawn where it impinges on the other things you mention. If your code is difficult to read or refactor because you’ve optimized it, you’ve probably optimized it too much.

    • CJK says:

      Yes, I think readability is a significant part of the context of Knuth’s remark.

      Especially when working on a low-level system (in our world, that’s embedded programming for microcontrollers – in 1974 it was pretty much everything) doing things fast will often require being less readable. A dispatch function driven by a table of function pointers is less readable than a switch-case tree. Binary bit-fiddling hacks are less readable than multiple step mathematical algorithms.

      Generally, simple structures which are easily understood are replaced by complex structures which are technically superior but harder to understand. When this happens, unfamiliar coders will treat it like magic – recite the incantations and hope. This leads to misunderstandings, confusion, and eventually breakage when somebody says the wrong magic words.

      For a non-programming-but-directly-adjacent example, consider the average user of Git. They have learnt just a handful of spells – git pull , git push , git add , git commit , and perhaps git stash and git reset. In the words of https://xkcd.com/1597/ , “Just memorise these shell commands and type them to sync up. If you get errors, save your work elsewhere, delete the project, and download a fresh copy”. It’s terrible advice, but very common, because Git is a solution so complex it will nearly always be seen as magic.

      • Shamus says:

        Ah!

        This really helps put Knuth’s quote into perspective. Thanks.

        • Clareo Nex says:

          I’ve always thought of premature optimization as premature compression/encryption. Encrypting your code for performance may be necessary, but should be done last.

          Let’s try to use your example, hopefully I don’t forget something important…

          As your location-array is a metaphor for some more complicated system, it is possible that metaphorically going north or south turns out to be unnecessary in the design. I thus implement moving as if it’s incrementing and decrementing (location = location+) and I automatically loop when I reach the end of a line of array [11,3] -> [0,4]. The room-moving function is now way more efficient, hooray.

          First, someone else looking at my code starts wondering why the player can’t go north or south. Then they have to remember whether incrementing goes left/right in the array or up/down. If we change the design later, the function has to be rewritten, and that auto-loop is going to cause bugs when (not if) we forget it’s there. A bunch of dependent code probably takes advantage of the simplification, which also has to be rewritten. E.g. what if movement_animation() only takes 0 or 1. One bit! Super efficient! Also, now part of a great bug nest. Finally, what if I think the design doesn’t need north/south movement, but I’ve made a logical error?

          In short I’ve made a bunch of assumptions which aren’t obvious in the code and may change later, for the purpose of compressing the algorithm.

          (The metaphor doesn’t quite work because we could change the array from two-value to one, which is much less troublesome.)

          By contrast, your array -> hashtable transmutation hardly removes any abilities from the code. It’s also intuitive rather than arcane – assuming distant rooms aren’t relevant comes naturally. Even if we decide to add something bizarre like realistic weather patterns, you can cajole the existing function into loading more distant rooms.

          Doing the transmutation early, as you mention, (probably) avoids having to rewrite it later during optimization, rather than introducing a new rewrite risk.

          From this we can make a preliminary checklist for optimization prematurity:

          -Am I baking new assumptions into the code?
          -Is the human RAM required for reading the code going up?
          -If we change the program design, will this have to be rewritten?
          -If we change the design, will dependent functions have to be rewritten?

          Robocraft armour might be a real life-example, except the part where I don’t know why the original design might have better performance.
          In their original design, they had ten tiers of armour, each of which was a distinct object. They changed it to one tier of armour in several distinct colours. Only since each armour colour is a distinct memory object, they can’t have a wide variety of colours. Can’t do a colour picker. (You can see the function is slightly simpler, having fewer moving parts, Load(armour_blue) instead of Load(armour, blue) but I’m still skeptical in general.)

      • Ander says:

        Interesting. Thanks for the perspective!
        p.s. I like the phrase “Think of branches as…” because “branch” is, presumably, already a metaphor for the underlying data structure. No, it’s not that simple, git guru! This comic was the first thing I put up in my work space when I started as a software dev.

      • Zak McKracken says:

        I remember reading somewhere something along the lines of:

        “Debugging is harder than programming. If you write the fastest code you can, it will be at the boundary of your ability to program, which means it will be beyond your ability to debug it.”

        I don’t completely agree with this because you don’t always need to completely grok your code to debug it, and there’s not really a hard boundary to a person’s ability to understand something but generally code becomes more complex if it needs to be faster (think of the raster interrupt tricks people did on the C64 to get more sprites onto the screen than the machine had registers for…), and if you did that sort of thing before you know that it’s required, you’re just making later attempts at improving the code more difficult, which, given a finite amount of total programmming time, reduces the quality of the final output.

        I could also imagine that people with tendencies towards premature optimization were (and still are) more frequent in programmer circles than elsewhere, which would amplify the problem.

      • Lupis42 says:

        Having spent some years now working on, in various ways, making programmers more productive, I think you vastly underrate the amount of time that one person spending a few extra hours of performance tuning can suck up. The problem is that it isn’t usually visible at the time, it comes up later when you’re running into a problem that should be fairly simple, but requires a bunch of refactoring effort first because the programmer found a way to get something to be ~50% faster or use ~25% as much memory. Frequently, they turn out to have done this by sticking a bunch of things that are related but do not need to be interdependent into one class/function/module in way that makes it damnably difficult later to separate out something, typically something that was clearly likely to need to be separable from the beginning. Other times it’s just about subsequent maintainability – a clever performance hack that saves ~$10k of hardware budget, but takes the team that has to make a change later a month, instead of a day, to properly test is an optimization that has probably cost on the order of 10x as much money as it saved.

        Now, I’m also biased because I work on a lot of cloud-based business software, so while we don’t treat CPU, memory, or disk as infinite, we certainly do tend to be able to clearly quantify the cost of using more of them, and can easily compare that to the cost of paying people later to spend more time on maintenance, testing changes, etc. The closer you get to hardware, the harder it is to substitute money for optimization, and the less future maintainability you need, the less important it is to pay extra for clean code, separation of concerns, etc. My gut, however, is that 99% of programmers, and 99.99% of new programmers, are underestimating the amount of future effort relative to the current performance gains.

    • Shamus says:

      The last few weeks only got 1 round of proofing, as opposed to my usual 2-4 rounds. I’ve blown through my lead time and now I’m hammering out articles just before they publish.

      I always have a terrible time balancing programming and writing.

      • DaveMc says:

        Well, Knuth always liked to say that premature writing was the root of all evil …

      • PeteTimesSix says:

        I kind of like it, actually. The last two articles ended up with an open-ended question as opposed to a conclusion…

        …which means that if I dont agree, I dont have to disagree, Im just offering an alternate perspective for consideration. Which is nice.

        (Yes I have written entire paragraphs only to delete them because I didnt want to have to say “youre wrong” before. I am bad at zee socials.)

        • Pete_Volmen says:

          Ditto. This blog is pretty much the only one where I read the comments, and the open questions really suit it. Reasonable relaxed conversation, no echo chamber but people with multiple viewpoints attempting to understand each other and make themselves understood.

          Even aside from the comments, these off the cuff posts are a pleasant read, and quite informative (even where I disagree I still find new arguments for any side of the conversation).

  5. Bodyless says:

    Your Example is a bad one. It assumes that whoever writes the program does so poorly.
    Ideally it should not matter how the data is handeled in the background for the rest of the program. Switching between an array an a hashtable should not require you to rewrite everything.
    See here for further information: https://en.wikipedia.org/wiki/Separation_of_concerns
    For Example: You can define an Interface for handling the data of your rooms. This will contain any function that you might need. The class(es) implementing the Interface then can be optimized as needed.

    The bigger problem is how you define “optimization” and “done properly”.
    Software Architecture is not “optimization”. And “done properly” may not require something to super optimized.
    Building hacks to speed up the game by 2%, which then have to be changed everytime you add a feature is certainly bad.
    In the end it is far more important for your program to work properly. This is easier to do without optimizations which could have beed added later.

    • Nick Johnson says:

      I came here to say this.

      I’m actually pretty curious about how well a philosophy like redux would work for building a game. There’s a good chance it would add a lot of overhead, but might add stability gains which are worth it.

    • Shamus says:

      “Ideally it should not matter how the data is handeled in the background for the rest of the program. ”

      You’re basically saying you should always do things the hard way, weather the extra abstraction is needed or not. This sounds worse than premature optimization, since it means building layers of abstraction (and enduring their overhead cost to code / run) where it is not needed.

      Sometimes it’s fine to pull stuff out of a simple array and get on with things. (At least, this is true in my domain.)

      • Bodyless says:

        “You’re basically saying you should always do things the hard way, weather the extra abstraction is needed or not. This sounds worse than premature optimization, since it means building layers of abstraction (and enduring their overhead cost to code / run) where it is not needed.”

        Yes, i would argue that not using abstraction is a form of premature optimization. The overhead is rarely that signifcant on modern systems. There are probably bigger concerns.

        Granted, i never worked in the video games industry. But as a professional c# software developer it often feels strange when reading about game development. There are so many recurring errors and problems which look like they could be easily solved with proper sortware architecture.

        • Blake says:

          As a game developer we basically feel the opposite.
          Adding unnecessary layers of abstraction leads to overengineering, which makes things more complicated to follow later, as well as introducing performance overheads.

          I’ve spent way more time trying to debugging complex systems that were designed to be simple to use but are full of abstractions to do it, than I have debugging something simple.
          If I see a function takes a pointer and a count of elements, it’s trivially easy to step through the logic and know that it’s just stepping over an array. If it’s something that takes a pair of iterators that could be any number of different types, there is way more space for bugs and confidence in the code is lowered dramatically as a result.

      • Echo Tango says:

        Abstractions are a way to keep everything simple. You said yourself, that you want to treat the pathfinding as a black box, then move onto something else. However, in your example, the way you store your data is not abstracted, and leads to a large amount of tightly-coupled systems. I think the cost of that spaghetti is higher than an abstracted data-storage system.

        • Shamus says:

          Abstractions are also a way to make things more complicated. As I pointed out above, it’s more work to make a system that can operate when data availability isn’t guaranteed. When combined with multi-threading (a given in the context of a game) this complexity is multiplied and you have all these thread sync concerns that you’d never have to worry about with a simple array.

          It really does depend on how you’re using the data and how many systems need access to it.

          • PeteTimesSix says:

            Usually the way I handle this is by writing the simple solution and refactoring into a abstracted black box the first time I actually have to make a big change. Keeps writing abstraction code to places where it actually helps and is usually surprisingly simple with a bit of Visual Studio magic.

            • Gordon D Wrigley says:

              This.

              What’s missing from this bit of the discussion is the observation that premature abstraction is also a serious problem.

              A great example of this I ran into some years back is we had a system that was storing stuff in version control specifically Subversion. One of the engineers had an abstraction addiction and so built an abstraction layer around Subversion. And because we were only using Subversion that abstraction layer essentially just codified how Subversion behaves.

              Time passed and this little thing called Git came along. And because Git has a quite different model of the world than Subversion that abstraction layer which took a bunch of dev hours to build and which was supposed to help in exactly this moment was instead a liability that had to be worked around and hacked. And eventually ripped out entirely to be replaced with a new abstraction that captured what was common about the systems interaction with the two different version control systems.

          • Methermeneus says:

            This comment thread needs some Jon Blow references; he explains this better than I ever could. Basically, there are two problems with abstraction. (There are benefits, too, so this isn’t an argument against abstraction, but against assuming that abstraction is automatically good without weighing the costs and benefits.) Both of these problems are particularly problematic for games development.

            The first problem is that abstraction is, by necessity, a form of obfuscation. Now, for most purposes, obfuscation of complex systems is a good thing: You don’t need to know what the registers, memory, drivers, etc. are doing to write a word processor, and your don’t want to know those things for a web framework, where you don’t run the servers, let alone the clients. On the other hand, games require meticulous memory management and elimination of bottlenecks smaller than most other applications have to deal with. If you’re using abstractions that you can’t open up and examine (ie: black box abstractions), then you can’t fix bottlenecks on a sub-millisecond level, which games sometimes require. Often, the best way to use open abstractions is either to build them yourself in the first place so that you can open them up again and make changes if necessary later on (architecting ahead of time, as Shamus is saying he does) or to run through the procedure you plan on using and pulling out appropriate snippets into abstractions as you go (what Casey Muratori refers to as “compression-oriented programming”).

            The second problem with abstraction is that it incurs overhead. Like a couple of people have said already, that doesn’t usually matter on systems (fast) in most contexts (don’t need to be fast, or involve passing data over the internet, which automatically causes a delay large enough to make most other latency irrelevant). However, like I said before, games need to be blisteringly fast. Even some 2d games can take over your cpu and memory while active, and 3d games will eat both resources along with your gpu and vram. (Older games might not, but the trend has always been more towards using faster computers to use more resources at the same barely-enough rate than to run simpler games faster or with a smaller percentage of system resources.) Also, honestly, even in other contexts, the overhead of abstraction is starting to slow things down, as programs are built of abstractions built of abstractions built of abstractions built of… anyway. (Okay, maybe you don’t want to hear Jon Blow talk about that; he feels strongly about it and gets rather heated in the subject.)

            Again, these don’t automatically make abstractions bad. My computers are full of cool stuff made from layers upon layers of abstraction, and a phone’s OS is basically three layers of abstraction enabling two more layers for each app, which are all programmed using several layers of abstraction, and I don’t want to go back to a world where all of that can’t work. However, maybe now you have some idea of why game programmers don’t like them quite as much as everyone else.

            • Zak McKracken says:

              Was thinking of John Blow, too. Effectively, every project has an optimal level of abstraction for its different components, based on the complexity of the task.
              If a program grows and has more functions added, more abstraction helps to keep the added complexity manageable.

              But starting out with the “optimal” level of abstraction is hard, and adapting it whenever you realize that you didn’t get it exactly right can also be a form of premature optimisation.

          • Matt says:

            It also makes a huge difference whether you’re working on a small subcomponent of a personal project versus something that will be used by a much larger team. Abstraction for its own sake matters a lot more when your code will be used 5 years from now after you’ve left the team and aren’t around to answer questions any more.

            • Mistwraithe says:

              Exactly. Sure games need performance and that is an argument against abstraction, but traditionally many have also been throw away projects (cf business software). Version two of the game will usually be several years later and quite often starts again with a more advanced engine. Even if it is on the same engine it will almost always be a case of copying the bits they want from version 1 at the start and thereafter ignoring the old game.

              Whereas business software can easily have the same code base worked on and improved for a decade or two. If it is done well then the most important business logic may evolve over that time to be used and shared between multiple different end user programs. In this context the benefits offered by abstraction are vastly more important as it allows the internals of different components to be rewritten over time without breaking everything else.

              I would posit that games development is actually slowly becoming more like business software. If you look at games like Crusader Kings 2 and Europa Universalis 4, the development lifecycle of those games is very similar to business software. Even games with a shorter lifespan are more likely to be on an engine (eg Unreal, Unity, an inhouse engine) which has a longer lifespan.

              • Echo Tango says:

                Even if you’re using a new game/graphics/whatever engine, it shouldn’t mean you need to rewrite your game/sequel every time you switch. You should be as loosely coupled to specific tech as possible, so that you can keep your game’s business logic from game to game, regardless of tech changes.

  6. Scott_C says:

    Depends on the industry too. Game code, as you’ve shown, is tightly integrated. You can’t change one thing without changing everything it touches so it’s important to do things right from the start.

    But a lot of custom internal business software is built in semi-isolated modules. Tool A generates a report. Later on tool B consumes that report and as long as the report is correct it doesn’t care what data structures or algorithms were used to build it. And since tool A and B both only run once a day there is very little need to make them fast. In this situation you really can get away with pretending you have unlimited computational resources until proven otherwise.

    Sure, it feels a bit sloppy but sometimes writing naive code fast is the only way to make sure everyone in your company gets the tools they need on schedule.

  7. evilmrhenry says:

    I always figured that he was talking about line-by-line optimization, where you take a function, and replace the x += 1 with x++ because that saves 1 cycle each loop with the compiler you have and so on. (Probably a bad example, as the compiler should be able to manage that, but whatever.) There are times when stuff like that is important, but it should only be done if needed for performance. For example, a few weeks ago I changed a function to return a tuple instead of a class, because that was in the inner loop of a rather heavy function; that change shaved seconds off of the user-visible response time because it no longer needed to create then destroy half a million class instances.

    Architecture-level decisions are not optimizations.

    • Methermeneus says:

      Well, I mean, I do those sorts of optimizations because I think they’re fun, but, yeah, that’s something for my spare time, not when I’m in the middle of a project. (Also, ++x is faster than x++ in contexts where you’re not doing anything else dependent on X’s value on the same line. Yeah, I went there.)

      • Sean Conner says:

        Citation needed. On some architectures (say, the Motorola 68000), a sequence of code like “x = *(++p);” (increment the pointer, read memory contents) takes longer because it requires two operations (increment pointer, read memory) whereas code like “x = *(p++);” (read memory contents, increment pointer) can be done in one instruction (and the post-increment nearly darn free). On a modern Pentium system, however, it really makes no difference in time (because the instruction that can do the “read memory, increment pointer” is *so* specialized (in that it requires certain registers to be set up in just the right way) that compilers will rarely use it).

        Also, be aware that compilers are really good these days. A few months ago I played around with the vector instructions available on Pentiums, and I *still* could not beat the compiler that didn’t even bother to use the vector instructions. Sigh.

        • Methermeneus says:

          Sorry, you’re right, I do need to specify: ++x is faster than x++ on x86, specifically, if the difference isn’t optimized away. Increment is a single instruction, so the difference is between “retrieve address contents; increment; do whatever” and “retrieve address contents; do whatever; re-retrieve address contents; increment.” Generally speaking, if “do whatever” is no-op, an optimizing compiler will cut that part along with the now-unnecessary re-retrieval and just do the same thing for both, but even if the compiler is as naïve as can be, the one extra instruction is not anything to worry about on a modern system. That said,
          1) Pre-increment is still technically faster (if negligibly so, and only with the naïvest of compilers)
          2) Bringing it up at all in this context was a joke, and your response makes me think it fell extremely flat.

  8. Sardonic says:

    This is a fun question, because nearly every programmer out there has a different interpretation of when optimization is still “premature.”

    I think if you are adding a lot of complexity to your code, early in the project, with the aim of making it run faster, before you know what you’re really building, you’re optimizing prematurely.

    For example, to keep it in the domain of game development, if you’re experimenting with a way of procedurally generating puzzles and you’re toying with different algorithms to see what will consistently generate usable puzzles, you are optimizing prematurely if you decide to hand-unroll all your loops and replace your function calls with gotos so you can share little regions of code and theoretically make it run faster. Your goal is to figure out an algorithm that will work — spending hours trying to make it run microseconds faster is a waste of time in its own right. Compound that fact with the reality that, later on, you will be looking at this code trying to describe your algorithm simply so you can port it into a new game in a different engine in a different programming language, and you can see that you’ve wasted your time on many fronts, and gained nothing of value for it.

  9. Knul says:

    In my experience in web development, yes, you just assume there are no CPU or RAM limitations for the majority of time. Mostly, you keep an eye out for not-so-optimal SQL queries, as those are the source of the majority of performance problems.

    Only if there is a noticable problem with performance do I get to optimize performance, but that is with less then (say) 5% of the code I’ve worked with.

  10. Ander says:

    I did not see all the hubbub about class size to be about premature optimization. I saw the primary concern to be that people thought you were optimizing in the wrong way (i.e. by asking question of C# that, due to its design, it is not well-suited to answer). It’s not primarily that the optimizing was seen as premature; the optimizing was seen as ill-conceived for the language. Of course, as you said in a comment, getting the element size was not seen by you to be an optimization concern; it was a simple question. Seeking exact memory size is more precise, and that might be why many said, “Wow, Shamus, that’s more than you need to worry about right now.” But again, I think the biggest concern expressed in the comments was, “That’s not how C#/Unity devs optimize. Attempts to do it this way will not benefit you as much as using [whatever] will because this dev pipeline is not meant to deal with.” I’m a C# dev who is not in a business that cares much about precise memory usage, so I can’t speak to one side or the other.

    My apologies if I assume too much; it seemed reasonable to connect this to the post from two weeks ago.

  11. Piflik says:

    Fun fact: Starcraft was originaly a Warhammer 40k project. But they didn’t get the license so they made their own version of 40k.

  12. Bloodsquirrel says:

    Part of the supposed danger of premature optimization is that it results in more complex, harder-to-understand code. Which, in the context of the original quote, when they were writing in unstructured languages and dealing with cycle-level optimizations, was probably true.

    But, like many things in computer science, it was taken dogma and now people repeat it without thinking to excuse being lazy.

    IMO, optimization is not premature when:

    1) You understand the problem domain well enough to know ahead of time that, yes, this part of the code is going to be performance-critical.

    2) You’re building structural elements of your program (often at the interface level) that will be difficult to change later on.

    3) You’re establishing usage patterns that you’ll want to be consistent.

    4) The optimized code is just as (or more) clear and maintainable as the unoptimized code.

  13. Alan says:

    Database folks are often incredibly fanatical about efficiency. It’s a big deal when running monstrous, heavily used databases.

    I think Knuth would agree on doing things right the first time. He did write an entire series of books on selecting suitable algorithms. I fear too many people are remembering the pithy summary but not the entire context.

  14. Daemian Lucifer says:

    Is this how the other coders work these days? Just code under the assumption that you have infinite memory, CPU cycles, and storage space, and then re-engineer everything later once you start running to hard limits?

    Have you seen modern games?That is exactly how they are working.

  15. Richard says:

    The real meaning behind the quote is this:

    Optimise the bits that need optimising. The rest just needs to work.

    It’s usually harder to make something fast/small and correct than to make it merely correct.

    Something that only ever happens once probably doesn’t need optimising for time.
    Something that only has one instance, probably doesn’t need optimising for size.

    Something that happens 60,000 times a second, or has 10,000,000 instances, probably does.

  16. Daemian Lucifer says:

    but what’s the actual damage of premature optimization?

    Simple answer:Half life 2 episode 3(or half life 3,if you wish).

    If valve werent so pedantic about optimizing their project before it even started,trying to think of THE ONE TRUE perfect way to make that thing happen,it wouldve happened 2 months after episode 2 concluded,and we would not have this awful cliffhanger hanging over us for all these years.Thats hardly wasting time of just the programmer.

  17. Daemian Lucifer says:

    And once again,it falls on me to link the relevant xkcd comic:

    A table of when optimization is early vs when it is ok

    If you are about to waste more time optimizing than is mentioned in that table,its too early.

  18. Primogenitor says:

    As well as the comments above about programmer time and readability, consider modern compilers.

    Something simple and theoretically slow but which a compiler can optimize, can end up being *faster* than a more complicated and theoretically better algorithm that the compiler can’t fully understand. A lot of smart people have put a lot of time into compilers, including just-in-time compilers in interpreted languages like Javascript & Python.

    Libraries have a similar thing – typically its better to use a generic library rather than trying to implement something specific to your application, even if you think you can see some optimizations that only apply to your specific domain.

    There’s other issues too – maybe your optimized code was 25% faster on your 5-year old development machine with a NVidia 451 but 200% worse on the latest AMD FooBars, which have only just been released and are going in the next PBone console.

    • Blake says:

      “Something simple and theoretically slow but which a compiler can optimize, can end up being *faster* than a more complicated and theoretically better algorithm that the compiler can’t fully understand”

      This is one of the reasons too much abstraction and over-engineering can be bad.
      If each layer of your code is too separated for you to be able to read, it’s probably too separated for the compiler to work with efficiently as well.

      This is one of the reasons why, if you know a flat array is all you need for your use case, it’s better to just have one of those than a more complex but easier-to-use structure.

    • Methermeneus says:

      I would argue very strongly against this particular point. Libraries are often terrible in terms of optimization. Simply because they must generalize for the completely unknowable use programmers X, Y, and Z will use them for, they will have all sorts of branching code and extra (and encapsulated) function calls, which makes for bad optimization. Worse, they often require all sorts of weird parameters and nearly-the-same functions, which makes them difficult to use outside of a core subset that most people use them for (which makes the rest of the library either unnecessary bloat in your own software, if statically linked, or difficult to port, if dynamically linked). This is only slightly better in languages with better library import like Python. Probably the best example of this is Boost, which has quite a few useful bits and pieces (and most are even well-optimized), but holy crap does it take forever to compile, and your final executable is huge.

      Worst of all, you’re assuming that the libraries are made by a bunch of people smarter than you who have spent years perfecting every aspect of their code and doing that optimization that is so unnecessary for people making the average end product. Here’s something important to remember: The authors of most libraries are not the authors of the highest-quality compilers. They make mistakes, they skip optimization for convenience or readability, and they assume that what’s readable to them will be readable to everyone, even if they are used to a different codebase with an entirely different style guide. And, just like programmers everywhere, they hate documenting when they could be programming.

      My (least) favorite example of that second paragraph (the whole thing, really) is OpenCL. OpenCL was once a C image manipulation library, one that was pretty easy to use. Some basic C tutorials and books actually had OpenCL projects in them; I remember one that captured and played back webcam footage as a series of JPEGs, which is really cool for a book that starts with “Hello World.” The authors of OpenCL, in their infinite wisdom, decided to port the whole thing to C++. Suddenly, functions took different parameters, everything was a class whether it needed to be or not, templates were everywhere, which C-compatible functions were kept, deprecated, or removed entirely seemed entirely random, and the documentation for half of what the library provided amounted to little more than copying the function prototype or class definition.

      No, using a library does not always make your life easier or your program better. Using a generic library is better iff it is easier to use than rolling your own and/or it actually provides some improvement to your final executable (depending on which is more important to you in the specific context), but you absolutely must determine if either of those is true rather than just assuming it’s better to import, and I’d argue that it doesn’t even make a good rule of thumb.

  19. adam says:

    Not directly related, but I feel there’s a corollary in the whole array vs. hashtable here in over-engineering, which is something I feel like I’ve been fighting against my whole career.

    So many developers LOVE coding solutions for problems they aren’t currently dealing with. They love to imagine how this or that bit of code with a narrow application might be abstracted to handle a more generic premise. Enterprise software is notorious for this. It is pathologically over-engineered, as if the developers think the only way they can earn their keep (or safeguard their jobs?) is to bury the salient, functional parts of their application under a hundred layers of cruft.

    Unfortunately, in my experience, nine times out of ten all this does is make the code inscrutable and less maintainable for the poor person who has to deal with it sometime down the road. So what ACTUALLY happens is that someone (often the original dev) goes to add some new functionality to this code or fix a problem, and due to the amount of abstraction it takes them triple the amount of time to understand exactly where their new code should fit in. Our new developer then also finds that all that extra work our original developer did in the hopes that it might help out later on doesn’t quite fit, so this new developer now has to spend a bunch of extra time following the superfluous code paths in various directions, code paths that don’t actually contribute anything material to the end result, to add the new functionality.

    I always tell the developers I work with to code to the problem at hand, and worry about contingencies later. Put your effort into making your code more readable rather than one-size-fits-all. Then, LATER, if the need to make it more generic actually comes about, THEN you can add your abstractions. Not only will they be easier to add to your existing, eminently understandable code, but the abstractions can be done with a specific purpose and therefore done more quickly and in a more readable fashion themselves.

    The vast majority of the time the original code is modified just a small amount, if at all. Let it be what it is and don’t try to make it something else. You’ll just confuse everyone.

    • Phil says:

      In the vein of “coding solutions for problems they aren’t currently dealing with”, and this is more a design level thing than necessarily the programmers….

      I used to work for a company that bought a small suite of applications (for scanning, indexing, and archiving images) from another company (who themselves had acquired it as part of an acquisition of another company). One of the server-side pieces was a workflow server; other apps would connect to it, log in, indicate what step of the workflow they were on, get a token (with some built-in data), process, and return the token with a result.

      This server was only ever used as part of this suite of apps, but the original company decided they wanted to try to market the workflow piece standalone. AFAIK, no one ever wanted or bought it, but we were stuck dealing with using this service (and maintaining/updating the code), plus a second that supported some of the file-access requirements of the system, when things would have been a lot simpler if they had both been built as one service specifically tailored to the suite.

      Good times.

    • Retsam says:

      I had a coworker notorious for this. Everything he did was a “reusable utility” usually with several optional parameters to control the behavior… but in practice it was only ever pulled in and used in one place with one set of parameters. When he left, one of the first things I did was replace a 100 line “util” file with the 8 lines of code that it actually translated to, once you removed all the indirection.

      Besides making the code more verbose and harder to reuse, the danger of coding for future hypothetical use cases is that you often guess wrong about the needs of future code. You end up writing the “wrong abstraction”, anyway. I’m a big believer that “duplication is cheaper than the wrong abstraction”.

      Personally, I generally run by a rule of three: if I’ve used the same bit of code in three different places, it’s a good idea to to try to abstract it into something reusable. And if something doesn’t fit the pattern later, it’s better to not use the abstraction (even if it means more duplication) than to try to stretch the abstraction to fit a case it wasn’t designed for.

    • Zak McKracken says:

      I like to think about generalizing code while I write it, but don’t actually generalize it.
      I merely go very little ozut of my way, if at all, to make code which can be generalized

      Example: I have two functions which do similar things, but not quite the same. So I’ll try to structure them as similarly as I can think of quickly, and leave a comment in the code about what would be needed to generalize them. Then, if I later realize I need a third or fourth functions which follows the same scheme, I rip out the guts from one of the existing functions, generalize/parametrize them or turn them into a class if appropriate. Plus points if it can go in a library which is useful for other purposes, too.

  20. Kathryn says:

    So, I am a project manager. And what you are describing looks a lot like something we PMs call “scope creep”. Now, I’m not anywhere near game development – in fact, I do hardware and rarely interact with software development at all – but if any designer is spending hundreds of hours on perfecting something that is good enough as it is, I will definitely be redirecting that person.

    The difficulty is that as a designer, you want your piece of the project (whether it is a cable or a CAD model or a chunk of code) to be perfect. (And I sympathize with this because I am a perfectionist also.) But the project manager has to take the higher level view, where your delay on delivering that CAD model means I can’t get my drawings released, which means I can’t get my hardware built, which means the entire project is late. If you are on the critical path, you can’t let the perfect become the enemy of the good.

    This is a very hard balance to strike. There absolutely are times when stopping, taking a deep breath, and planning IS going to save you more time than it costs. (We spent a ludicrous amount of time developing requirements on my latest project, and it paid off.) But there are also times when you have to get moving even if you end up having to correct down the road, because you will lose more time trying to get it right now than you will getting it 90% right and fixing the last 10% as you go.

    Unfortunately, the only way to know for certain which time is which is with the benefit of hindsight. In the moment, you have to rely on experience and gutfeel. This was the first and probably the hardest lesson I’ve learned in my career so far, because my maxim was NEVER “the perfect is the enemy of the good” – it was “good enough never is”. But that’s not true. Good enough is indeed good enough.

    • Steve C says:

      Reading Shamus’ blog I feel it is a step before that. And that fact is getting ignored by all the programmers responding in this thread. It is attempting to get enough information to make an informed decision on how much time to spend. If getting that initial piece of information is part of the “optimization process” then everything is a recursive “premature” problem with no solution.

      How fast should we set our flying speed to make our arrival window?

      Well how far away is our destination?

      Whoa whoa whoa buddy. There’s a dozen different routes we could take. That’s premature optimization.

      Umm no. It’s some fact finding.

      • Kathryn says:

        I was responding to the general concept of “premature optimization” (which, again, as described here sounds to me like another name for scope creep) rather than to the specific situation Shamus appears to be in with regards to whether he should store things in memory or regenerate them if they’re needed. I really can’t comment on the latter in any informed way.

        Unfortunately, sometimes you have to make a decision with incomplete information. I don’t like those moments – I always want more data – but it’s an important skill for a project manager.

        • Steve C says:

          Well you’re not wrong. It’s just that is not (IMO) the correct topic. The topic of this post is “Preemptive Premature Optimization.” It is written as a self-check. However the thesis is that a whole bunch of things that are not part of “premature optimization” are being lumped into it. That the term is being used too broadly. The point is it is a misnomer causing Shamus frustration and the blog post is a vent. The specific situation Shamus is using is an example to illustrate that point. That it isn’t optimization at all. Just basic information about the resources/constraints in order to conceptualize what is going on.

          Using the project manager analogy, there are project milestones to be reached at certain times. Not knowing what those milestones are nor when they are due would be weird. This is like asking for milestones you know exist and being told you don’t need that information.

  21. Daimbert says:

    Looking at your example, I think it’s important to consider the difference between designing for scalability and optimization. In your example, you’re thinking about the way to store the rooms because you’re thinking about what’ll happen if you get a lot of rooms per level. The first question you have to ask is: how many rooms are we likely to have per level? Is it a small number, medium number, large number? If you’re only going to have something like 20 rooms — ie you aren’t actually making an open world game — then take the easier solution and don’t worry about swapping rooms in and out of memory. The second question to ask is, indeed, how large these objects will be, but you don’t need to know exactly how big. You just need to know if they’re likely to be small or huge, roughly, or how much of a percentage of your memory usage compared to the other objects you have they will use. You do ask the third question, which is how hard will it be to convert to the other model if you need to scale later. And the final question is how much work it will take to make it scalable. At the end of all of this, you should know whether you need to care overmuch about scaling and whether you should build that into your design. In my opinion, the answer usually comes down to making it so that your design CAN scale to the levels you think it might have without having to rewrite the code without having to implement all of the scaling mechanisms — and testing them — in the first release.

    The same thing applies to future functionality, if you have it: you don’t have to implement it, but a good design will let you do it without having to rewrite everything.

    There’s another issue with optimization: there’s really no point in doing it until your product works and the functionality is mostly implemented and you don’t foresee much more churn. Why? Because you can spend a lot of time optimizing code or even your data structures … and then discover that it’s not going to work and then rewriting all of that. To take your example, you might do all of this, and then discover that you can’t actually build this as an array of rooms at all because of how the rest of the code has to work, and so you need a different class structure entirely. At that point, you’d have spent a lot of time optimizing code and structures that you now have to toss away.

    “Doing it right the first time” is design, and is generally scalability and using best practices to maximize performance. Optimization is when you want to do it the most efficient way possible — generally to hit metrics — after the code is set to get more performance out of the system.

  22. Jack V says:

    I think, as with the all aphorisms, recognising “too much” is a matter of experience, and there’s no clear rule for “too little” or “too much” — but the aphorism can be useful in reminding people that the right amount of whatever-the-aphorism-is-about isn’t “as much as possible”, but a balance.

    Based on what you said, you are in fact choosing a sensible trade-off here. And I mean, even if you’re not optimising as such, it’s reasonable to want to know to an order of magnitude “how slow copying structs is” and “how much storage space structs use”.

    I don’t know if it’s a universal scourge, but I definitely think optimising when it isn’t needed is a common problem, simply because if you optimise in any way that involves more code, the extra code is an increased burden on understanding and maintenance. When I last worked on a big high performance system, ONE stage was a bottleneck, enough that the rest didn’t really matter. We fortunately didn’t spend any time optimising them, even though we had many ways of improving their performance that we’d expected to need because what we were doing was just slow, but it turned out, it was slow in a place that didn’t matter.

  23. Asdasd says:

    Not being a programmer, this was another of those columns where I’m not really able to follow a great deal, but suddenly I realised this is something that happens to me all the time in games.

    I’ve spent hours in a many a JRPG trying to work out how to optimise tedious random encounters to shave off as many turns as possible. Which spells have the greatest damage/animation frames ratio? Should I be balancing my party, or dumping resources into the characters fast enough to act before monsters do? Can I solve the mana overhead to get me through this dungeon as quickly as possible without running dry?

    I always end up coming to the same conclusion, namely that whatever my success, I’ve surely spent more time than I could have ever hoped to save.

    • Kathryn says:

      I am a completionist, and as such, I am very familiar with this problem and often make spreadsheets for video games so I can efficiently complete the remaining tasks. (I still have the one where I worked out the most efficient order in which to equip Espers in FFVI. And for FFXII, I went to great lengths to minimize the number of trips to the Great Crystal.) There is a point at which it starts feeling too much like work…

      • The Rocketeer says:

        I was confused for a second and thought maybe I had posted this comment and forgotten about it. But no, I guess there is just at least one other person out there broken in the exact same way as me.

        You, uh, mind shooting me that FFVI one, by chance…?

  24. Jay says:

    I think the “evil” comes in when you’ve spent 500 hours writing code for a hashtable and you realize you’re going to need a full grid (or vice versa). In other words, when you’ve made expensive decisions based on an incomplete understanding of what the project needs.

    I once had a manager tell me that they had bought a new $750,000 machine for me to use in my work. I had six months of funding left at that point, and the new machine eventually took someone else over a year to set up and get stable. Also it used technology that required a security clearance, which I didn’t have and couldn’t get (there was a two-year backlog at that point in history thanks to 9/11).

    • Jay says:

      You probably don’t see much of this sort of premature optimization in games development any more. People have made enough games that it’s usually possible to make reasonable guesses about what will be needed, at least at a high level. Still, there’s always some idiot who will write code around RaymondSort because it’s proven to be 0.1% faster in single-threaded environments with less than 10 megabytes.

  25. Rick C says:

    Knuth didn’t mean “don’t optimize until you’re done.” He meant, among other things, “don’t try to optimize small improvements in non-critical areas.”

    One common thing is guessing where the slowdowns are and optimizing what you think is slowing you down, rather than profiling to find where the actual bottleneck is. People do that all the time.

  26. Blake says:

    I think what you’re doing counts as designing, not optimising.
    I’ve worked in a games studio for over 10 years now, during that time the only things I’d really count as bad early-optimisation would be when people spent a bunch of time trying to use math intrinsics for functions that really didn’t need them.
    The rest of the time the costs of writing cheaper code was trivial.

    Actually generally speaking, at the function level the fast code tends to be the simple, easy thing to write, and you sometimes have to intentionally de-optimise your code by adding abstractions to make things easier for the next person who works on it.

    At the interface level, over-engineering is a far far far far bigger problem than over-optimisation. Every abstraction is a new concept you need to understand, and bad abstractions can lead to a staggering amount of difficult-to-read, hard-to-maintain, hard-to-refactor, hard-to-debug, hard-to-trust, slow-at-compile-time, slow-at-run-time, bad bad code.

    Now having said all that, if you’re passing something like a big container of ‘Rooms’ all over the place that you may want to change the data structure for later as requirements change, then it should probably be in it’s own type that hides that detail. That seems like a reasonable abstraction that’ll save on refactoring later.
    However if it’s only going to be referenced in a few places then go with the simple option because it should be easy to switch to the complex one later.

  27. Sestren says:

    Rather than trying to ‘interpret’ Knuth’s statement one of half-a-dozen ways as has already been done here, I would like to post some of the context around it. Here’s some of the surrounding paragraphs from the paper the line appeared in, “Structured Programming with goto Statements”.

    “The conventional wisdom shared by many of today’s software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by pennywise-and-pound-foolish programmers, who can’t debug or maintain their “optimized” programs. In established engineering disciplines a 12 % improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering~ Of course I wouldn’t bother making such optimizations on a oneshot job, but when it’s a question of preparing quality programs, I don’t want to restrict myself to tools that deny me such efficiencies.
    There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
    Yet we should not pass up our opportunities in that critical 3 %. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.”

    Draw your own conclusions from that.

  28. J Greely says:

    I always go back to Jon Bentley’s book “Writing Efficient Programs”. Like his “Programming Pearls” books, it puts a lot of that classic advice into context with clear examples. Well worth a read.

    We used to have to deal with an 8000-line Perl script that called an 8000-line C++ library, and while our entire group understood the inputs, outputs, and basic transformation from A to B, no one could figure out why it ran for 25 hours a day, chewing up 3 CPUs and 3 GB of RAM (18 years ago, when this was a much bigger deal than it is now…). Anyone who tried to dive into the code discovered that it was filled with optimizations that obfuscated the logic, including a bunch of internal state-tracking so it could (theoretically) restart in the middle if it crashed for some reason. It was also capable of reformatting file systems, because that was faster than deleting the previous day’s output (in fact, due to a file system bug, if you didn’t reformat occasionally, 25 hours eventually became 40).

    The day I had to finally debug it, I died a little inside, especially because the state-tracking optimization was a little off, and we had to run it twice after making each change to see the results. If we hadn’t had a smaller dataset to test against, we’d probably still be working on it today.

    The problem? Statically allocated buffer that was just a wee bit too small for certain combinations of input data, and the C++ library didn’t do any error checking until it tried to return the results to Perl. Because checking for errors is slow…

    We eventually managed to get the runtime down from 25 hours per day to 4, by buying faster servers.

    -j

  29. Agammamon says:

    what’s the difference between “premature optimizing” and simply building something properly the first time?

    Mainly whether the project will take 10 times as long as you planned for or 100 times as long.

    If you don’t do premature optimization then it’ll only take 2 or three times as long as you planned for.

  30. Mephane says:

    The quote says “premature optimization”. If you know you will have miles of rooms and can reasonable estimate that they will never fit into memory simultaneously, or take much too long to load all of them on startup, then optimizing this problem is not premature at all.

    And I think the real problem with premature optimization is not that one might waste programmer time optimizing what is actually a non-issue. The real waste often is the disproportionally larger amount of programmer time needed later to use and maintain that more complex optimized code.

  31. Duoae says:

    I’m going to be lazy and not read other responses so probably someone else already said the same thing:

    What you’re describing isn’t optimisation (IMO) but project design. You need to do your project design early on otherwise you don’t know what you’re going to need. Similarly to how the prototyping phase of a game helps to lock down features and play styles before going into expensive production, you’re just thinking what your project is going to be like at the end so you can design toward that goal.

    Maybe one of the differences in your use case is that you have always straddled the design and coding worlds, meaning that you can see what is required for the end product whereas maybe most other software engineers do not have this control/insight.

  32. I think this is the correct type of optimization decision to make at this point in time. It’s really an algorithm choice, not an optimization.

    I’ve been involved in a project that was full of premature optimization. Almost every function was written with multiple Boolean/integer template arguments to force to optimizer to generate multiple code blocks to avoid dynamic branches. There were thousands of likely()/unlikely() GCC branch prediction constructs littered throughout the code. Also tons of special “fast” control flow paths for the common cases. All of this bloated the code, made it unreadable, took tens of CPU hours to compile, and probably doubled development time. For what? A 20% speedup for the 10% of the code that actually matters? Isn’t all that time and effort better spent on algorithmic optimizations?!

  33. Onodera says:

    I don’t write games for a living, but I manage programmers that write business software for a living.

    Our standard approach has the following steps:

    – use your knowledge and experience to identify the bottlenecks of your design
    – use your knowledge and experience to pick the right tools to implement the bottlenecks
    – use abstractions to isolate parts of your design
    – continuously measure to identify performance issues
    – rework parts of your design that slow down the program

    Very rarely are abstractions the source of insufficient performance. Yes, you should use your (again!) knowledge and experience and avoid obviously bad abstractions, but even patently braindead stuff like using XML to send data from one function to another in the same process turned out to be mostly fine.

    I know games are not business software, but they are more similar than they look. A lot of indie games aren’t performance hounds, juggling only a bunch of sprites around. And close-of-business-day software needs every ounce of performance you can squeeze out of it, even if it’s not frame-to-frame.

    • StuHacking says:

      Well, I don’t know what kind of business software you write, but this doesn’t jive with our standard process:

      – use your customers’ complaints to identify the bottlenecks of your design
      – use the project architect’s hobby framework to implement the bottlenecks, because he’s really happy with the novel API he developed 10 years ago.
      – use abstractions to hide what the code is actually doing.
      – continuously firefight performance issues in production.
      – rework parts of your design that slow down the unit test suite/build process.

      /sarcasm ;-)

      • J Greely says:

        StuHacking: rework parts of your design that slow down the unit test suite/build process.

        I won’t name names, but there once was a company that planned to make a high-end disk array, and after lengthy development and extensive QA, their first real-world demo was a disaster that they never recovered from.

        It turned out QA had only been formatting the first 100MB of each drive whenever they kicked off the test suite, because if they did the whole thing every time, it took too long.

        -j

  34. MaxEd says:

    I’ll tell you what premature optimization is in two examples.

    1) At my first job, we wrote a MMO server. It used Lua scripts for character’s abilities, like Fireball etc. This being 2007, we were a bit worried about it: calls from Lua to C++ and back aren’t cheap. Could our server withstand the really huge load? So we finally did a performance test. And found out our networking team fucked up really badly. They were allocating a new buffer for every incoming packet, and the server went into death spiral at about 100 clients. We could have spent countless hours optimizing out Lua and replacing it with something more efficient, but in the end, it would be useless (Lua calls ended up so low on profiler’s results we probably had a long way to go before they became a problem).

    2) Premature optimization is spending a month or more on writing super-efficient simple physics system (basically, just multi-level heightmap, so players could run on bridges over ravines) for another MMO, stuffing more information into every byte so the code became quite impenetrable to anyone but the author, and then closing the game after an open beta where the number of players never went past 100. We could have released this code with WAY less optimal implementation (hell, use a 3rd-party library instead of writing our own code), and notice the lack of interest among players quicker, and spend more time on improving gameplay instead of wasting time and money on optimal code that never got used.

    In games, putting something out quick at first, and optimizing later is almost always a good idea. OK, if you can optimize something cheap – good. No need to use Bubble Sort if your language’s library has QuickSort already implemented. But don’t waste time coming up with a custom sorting algorithm that’s uniquely suited to your needs or spend a week choosing between HeapSort, QuickSort and MergeSort to squeeze out a theoretical 10% improvement in performance “if our game ever hit big” during the first month of development.

  35. default_ex says:

    This is definitely a topic I struggle with myself but all of the “premature optimization” callout nonsense has always been aggravating to me. I’m meticulous with my planning of a system. I would rather spend an extra hour with pen and paper than an extra day (or week) banging out details on the fly. However I have learned from working with others online that you must include your notes, even if they are crappy shorthand on the back of a napkin. It will save both you and whoever your working with a ton of time later, it’s amazing how well others can read your chicken scratch later when they really need to.

    However there certainly is a lot of people throwing around the words “premature optimization” as an excuse to avoid any sort of planning of a system. My rule of thumb is if it requires anything more than rudimentary algebra to implement, it needs to be planned out. UML diagrams are certainly a good place to start, a program like CEDAR logic that let’s you do UML and flow charts in one diagram helps tremendously. Even better if you make use of features in your IDE, especially with Visual Studio since it can generate stubs from your flow charts and UML that will automagically update as the code is changed.

    Premature optimization. Well if your whipping out bit twiddling hacks and abusing obscure language behaviors before you even test it. It’s pretty fair to call that premature optimization. Just have to discern between abusing languages features and using them as intended. The latter is much more likely to be understood by your peers.

    • saj14saj says:

      My experience with UML is that it is bringing $500/lb truffles to a backyard burger cookout, when some plain old fried mushrooms would have been fine. What domain do you work in that the formalism of UML is more helpful and productive than a quick old style boxes and lines with a few labels type sketch on paper, or its on screen equivalent?

  36. blue_painted says:

    In my not-so-humble opinion …

    Design is algorithm/structure choice before the system is running (or running again in case of a refactor), so getting this bit right is “doing it right the first time”

    Optimising is improving the performance of a system that is running. I’d go further … correct optimising is improving performance where it matters, and you can’t know that until it is all working.

    It therefore follows that “premature optimisation” is optimising before the system is running.

    • You are missing the interim stage “Programming” aka “Coding”. A stage that involves both “Design” and “Optimizing”.

      Edit: I guess this is where the “art of programming” comes in, those skilled have a perfect balance of both for the given situation/code.

  37. Vi says:

    Here’s an example from my own misadventures:
    Me: Ooh, Unity can make games for a phone?? Magic! I wonder how much graphic I can squeeze out of it!
    Internet: An old phone can run #### polygons, #### pixels of texture, and no transparency everrrrr.
    Me: Challenge accepted! I’ll just stuff every texture into a fancy, tinted, channel-split tilemap and read it eleventy times per polygon!
    Tablet: *commits harakiri*
    Me: *tries normal shaders with multiple textures and transparency*
    Tablet: *begins recuperating*
    Me: WHY DID EVERYONE LIEEEEEEE

  38. I can only speak for myself but “premature optimizing” is a vague term.

    I’m guessing it originally reflected to how you had a plan you stuck to it and you only optimized after you finished the code (i.e. a manual optimization pass).

    Now, I do those as well. But I also tend to optimize as I go. If I write function and in that function is a loop, I may rewrite the loop even before I’ve actually tested the code, just because I get a idea there and then how it could be written better.

    With functions like that I also prefer to add sanity checks to function arguments and error handling and make sure that data returned/changed by the function is within a sensible rage.
    “be flexible in what you read in and strict in what you write out” is a good mantra to code by.

    Usually I rarely go back to these functions until way later in a project when I’m doing a manual optimizing pass (code cleanup, comments, text formatting etc.).
    When possible I like to make/build a library of include files with functions I can recycle (aka a “include toolbox”), they kinda need to be fairly solid from the start.

    Debugging is also way easier if I got error handling in place. I usually use a macro to leverages the IDE debugger when testing but (optionally) can make use of a external debugger when finalizing a build. Something like https://docs.microsoft.com/en-us/sysinternals/downloads/debugview is serviceable. Although I’m contemplating making my own one that uses the same API as Mark’s tool, only pre-filtering the output of the program (so a end user don’t have to mess with filters and the log won’t get bloated).

    A recent addition to a project was adding a optional error log window, this is a hybrid of a error log and debug log. It shows errors but it also shows some informational notices about various stages of the program, things that do not affect the program but might be worth a look, like being unable to connect to a server temporarily. The user can also save this log. I refer to it as a debug log rather than a error log, as actual errors that prevent proper behaviour of the program actually shows a error message/box instead (aka critical error).

    Also, errors may have a error number but it also has plain English text describing the error. this bloats things but it’s way better if things aren’t cryptic. So it’s usually a error number or a function name + a error number then text Describing what went wrong for example: error 14 in function “update()”, DNS lookup failed.

    I try to think of these things “as I code”. Sometimes I just stare at a line that start with “for” and ponder if I should use “while” instead. Some might call that premature optimization, to me that is just “normal coding” really.

  39. Offtopic (sorta).
    but here’s a protip.

    Ever see urls to MSDN like this? https://msdn.microsoft.com/en-us/library/windows/desktop/aa363362(v=vs.85).aspx

    You can just trim off the (v=vs.85).aspx part so you get https://msdn.microsoft.com/en-us/library/windows/desktop/aa363362
    and it will work fine, and it won’t add it as you follow links further (or at least shouldn’t).

    This is nice if you post urls in forums or blogs that bug out when they encounter ( and ), it also makes urls shorter, and more “future proof”. I don’t think MSDN actually uses aspx any longer, that postfix in the url is just there for legacy reasons.

    Edit: Case in point, the url with the postfix breaks in this comment.

  40. Carlos García says:

    «Just code under the assumption that you have infinite memory, CPU cycles, and storage space, and then re-engineer everything later once you start running to hard limits?»

    I’ve got that suspicion for a long time, considering how I saw games raise their requirements exponentially without a equivalent improvement on quality or features.

    The moment that made that feeling more obvious was when, after playing Monaco Grand Prix Simulation 2 (MGPRS2) from 1997, with EA’s F1 99-02 and F1 02. MGPRS2 needed very little resources, cockpit was a bmp with small holes for the speed and wheel display of the car, the other cars were simple 3D models that for the time looked quite good and realistic. The EA’s F1 (current F1 games are from Codemasters) made the whole 3D model of your car with camera in the cockpit… the system requirements were hugely bigger than MGPRS2, but the 3D models only looked marginally better, they removed the career mode from MGPRS2 where you started in a bad team and then according to your performance you’d get hired by a better team or stay in yours (or perhaps drop, but I never had that bad of a season), they removed the ghost car for training, their cars had more trouble taking into account yours (a problem that is worse in the current F1 games I tried, F1 2010, 2015 and 2017): if you braked too early in MGPRS2 the car behind you would either avoid you and overtake or brake to avoid the crash unless you made a hard stop and they were right up your gearbox, in F1 games you brake early and even if the car behind has plenty of space to react they’ll still run into you (and on top of it you’d get blamed for it). All in all, for me MGPRS2 was a far better game: graphic quality was close to F1’s and it had more and better features and better AI when it came to you braking too early when you still didn’t know the braking points well. The only con of MGPRS2 was that it had a bug by which it didn’t scale well the accelerator with a wheel/pedal controller and forced you to use acceleration help to avoid slipping with the rear wheels, while F1 you could practice extreme caution when accelerating out of a turn. As for the most often criticism of MGPRS2 that I saw, it was “cars go in a line like a train”, that I could also see in the F1 games. The only point where I noticed the difference was the start of a race, where MGPRS2 would very quickly set into the train line and F1 games take some turns to get there, which I think it’s in the source of why MGPRS2 AI cars knew not to slam their cars into yours and F1 AI cars don’t.

    It is now the time that I still prefer the idea of playing the MGPRS2 than any of the later F1 games, including the current ones. Unfortunately, I can’t get MGPRS2 to run any more after I moved from Windows XP to 8.1 and now 10. The others can be played with keyboard with the exception of F1 2015 which has a bug that makes the controls not work right with keyboard more often than they do.

    • Carlos García says:

      Notes, as edit timer passed: don’t tell me F1 games need a wheel, I know it, I just can’t use it due to physical limitations.
      Also I forgot to add the F1 games removed the ability to save during a race first and later even in the middle of the weekend, which is important for me as I refuse to play non full length races but lack the time to play full races in a single go, and with some circuits my fingers start hurting too much before I can complete them.

  41. Kris May says:

    A good lesson in optimization by one of the itch.io devs: https://github.com/fasterthanlime/stop-optimizing-me

  42. Grumble says:

    Two things:

    1) Write your room interface in a way that doesn’t require rewriting the rest of your code base if you change the implementation to a b-tree or a hash table or a flat array.

    2) Once you’ve done analysis (how much memory do I have in my budget? How much memory will each of these approaches take?) your optimization is no longer premature.

    And a third, just for fun:

    I think what Knuth was talking about was doing things like loop unrolling and implementing duff’s device just to “squeeze out a tiny bit more performance” before you measure your code and find out what’s slow or wrong. I’ve worked on very high performance systems where some parts of the system needed to finish their tasks in 30 clock cycles (or else Bad Things happen) while other parts used bubble sort.

    Most of what was considered optimization in the ’70s is now done automatically for you by compilers and fancy CPU architectures. If you ever find yourself working on a 16-bit machine with 512 words of global memory and a 186 word deep stack (most of the projects at my last job) you might see places where other people “optimized” by putting all the global data in one giant struct so it could be packed efficiently with bit fields and such. However, that kind of thing leads to really awful and hard to maintain spaghetti. It’s much better to put modules’ persistent state in the modules, and build the system as though other modules need to ask nicely for access to that state. Then, if you do end up running out of memory you have some low hanging fruit and a little bit of work, but your API hasn’t been compromised by bad design decisions.

  43. mockware says:

    Hmmmf. Kid’s these days always wasting time (clock cycles) and expecting others to clean up after them.

  44. My perspective: your example is not “software optimization” so much as it’s “software design”. Ideally, you’d know up front how big your world is going to be, do a nearest-power-of-ten estimate of how much RAM is needed per room for object/NPC tracking, how much CPU you can devote to off-screen physics and NPC AI, how much GPU you can devote to occluded geometry, etc. etc. etc., then design your internal APIs around those design requirements.

    Later, when you realize “oh shtuff, my tricked-out developer workstation with 64 GB of RAM is hitting swap and this isn’t even a debug build”, then you profile to figure out where the resources are going, pick the biggest user (e.g. RAM used by in-memory rooms), then ask yourself “Hmm, we need to target 4 GB gaming PCs but we’re already unloading every room we can get away with unloading; can I reduce the RAM used by a loaded room by 10%? That would get us back under budget”. That is optimization.

    When you start deciding on things like “hmm, I could get away with an int16_t here instead of a plain int” and “oh shit, we have 40 booleans in this struct, let’s turn that into one uint64_t with bitpacking” and “hmm, I can run this loop 5% faster if I hoist that expensive function call out of the inner loop and use this Newtonian approximation instead of the exact value”, then you’re talking about the sort of optimization that Knuth was warning against, i.e. that should never be done prematurely.

  45. Lanthanide says:

    A lot of people have already likely covered this in one way or another, but I’ll throw in my 10c.

    It ultimately comes down to programmer time, which like CPU, memory, etc, is not infinite.

    1. Spending time working on optimising code path A, when it works correctly and performs perfectly acceptably, is time that you can’t spend on implementing new feature B, or fixing bug C, which are likely more important things to be doing.

    2. If you do go ahead and optimize A anyway, then more often than not, your ‘optimization’ will make the code more complex. This increases the chances of bugs being introduced, and will likely make it harder to debug and solve future bugs. This will potentially cost developer time in the future.

    Sometimes optimizing code can both speed it up and make it simpler. In which case that’s not strictly “optimization” because you’re not only gaining a performance benefit, but it is really “refactoring”.

    Whether you spend time refactoring code path A, instead of implementing feature B or fixing bug C, really depends on how critical code path A is to your application. If it’s a dead-end cul-de-sac that is rarely called, then if it operates correctly and doesn’t have any bugs and is not likely to be extended, then refactoring is likely to be a bad idea. But if it’s a crucial piece of the software, refactoring will likely pay you benefits over time.

    I just yesterday had an interesting angle on refactoring and premature optimization. Some code I wrote for our system 7 years ago to manage some hardware resources with peculiar rules needed to be extended to support a new feature on a new hardware platforms. In the 7 years since I wrote the code, there have been no bugs found and no significant changes made to it in any way (which is good since it took me 4 attempts to get it right). The new HW has a much more plentiful supply of the HW resources, so the approach the developer took was to simply clone my original management code and have it run on a separate chunk of the HW resource – since there’s enough available both managers can work independently without disturbing each other. A more pure approach would have been to refactor the original code to support the new features and new platform – it would be do-able, but the original code is fairly mind-melting as it is, so it wouldn’t be a lot of fun. But since the code has been rock-solid for 7 years so far, the most expedient approach was simply to make a clone of it. There’s a slim chance that in the future on this new platform we may end up with resource contention between the two independent managers as we add more features, at which point in time we’ll be forced to refactor the two separate engines into a single engine that can handle all of the features together.

    But I think that’s a nice example of where we have avoided premature optimization / refactoring work – the code that we have right now works perfectly and there’s enough resource available to meet our needs, so why spend more time on it than we need to? We’ve got heaps of other things to be getting on with.

  46. EmmEnnEff says:

    I am of the opinion that any non-trivial[1] optimization that takes place before you actually measure your performance is premature.

    Unless you have a lot of experience in solving with the exact problem you are trying to solve… Your first guess as to where your performance bottleneck is going to be will probably be wrong. Thanks to Amdahl’s law, the programming hours that were wasted in optimizing a non-performance critical path through the system (Often at the cost of readability, maintainability, or correctness) will not have a noticeable impact on performance.

    It’s like the old proverb about how you should measure once, cut twice. Measure, then decide if you still need to optimize.

    The same can be said about almost any piece of best programming practices – making code generic, making code portable, making code DRY. Why wouldn’t you want your code to be generic, and easy to extend? Why wouldn’t you want to avoid any copy and paste? Are you some kind of savage, who doesn’t care about portability?

    Alas, not all code will be extended, sometimes repeating yourself is better then bolting in an inappropriate nonsense with a dozen AbstractFactoryModuleBuilderFactory classes, and maybe porting your game to Android will make you less money then you spent on building your beatifully-engineered, completely-generic, platform-independent framework.

  47. PowerGrout says:

    A little late to the party but nonetheless a perfect occasion for a favorite aphorism of mine.

    “Nothing wastes as much time as trying to save time”

    Surely not the root of all evil no, but I can see what the guy’s getting at.

    • J Greely says:

      One of the anecdotes in Jon Bentley’s “Writing Efficient Programs” is about a programmer who spent an entire week optimizing part of a FORTRAN compiler in the 1960’s, only to discover two years later that his new code had never been called in over 100,000 compilations.

      -j

Leave a Reply

Comments are moderated and may not be posted immediately. Required fields are marked *

*
*

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun.

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>