About the Author
Mass Effect
Final Fantasy X
Batman:Arkham City
Borderlands Series
Weekly Column
Champions Online
World of Warcraft
DM of the Rings
Good Robot
Project Frontier

Quakecon Keynote 2013 Annotated: Part 3

By Shamus
on Wednesday Aug 7, 2013
Filed under:


Link (YouTube)

As before: In the process of going through this I’m bound to commit minor omissions, errors, misunderstandings, grammatical errors, or war crimes.

Times are approximate.

28:30 “You can put multiple Titans into a PC and render absolutely incredible amounts of FLOPS and Vertexes and Textels”.

He’s talking about the huge horsepower of top-end PC’s. You can put multiple Titan graphics cards into a single PC so they can share the load and work even faster. (This is insanely expensive, but also insanely fast. I think one Titan will set you back about a thousand bucks, as of this writing.)

FLOPS are FLoating-point Operations Per Second. When we’re talking about how fast a processor can crunch numbers, we often measure performance in FLOPS.

A texel is a “texture pixel”, and in this context it’s basically a measure of how fast the graphics card can fill up the screen with all the stuff it’s trying to render.

32:30 “We still have DXT compressed textures that turn a lot of things into garbage.”

In this section he’s talking about the weird performance myopia that some people exhibit where they’ll polish some small detail while ignoring glaring technical flaws of other areas. It’s kind of like someone carefully waxing and buffing their car to get all the fingerprints off it when it’s got huge dents and rust holes.

DXT is a way of compressing textures. It saves you graphics memory, but it sometimes degrades image quality. This article has some side-by-side shots.

33:00 “Everybody bakes 8-bit normals.”

“Bake” in this context is the process of making some static (unchanging) data to be used in the game. For example, if you have “baked in” shadows, then the shadows are calculated during level design, and they can’t move around once the game is running. Half-Life 2 has baked shadows. Doom 3 doesn’t have baked shadows, which means they can move around, change, etc.

“Normal maps” are “bump maps”. (It’s a bit more complicated than that. But it’s not worth getting into.) Most textures are color data. Like, if you put a picture of bricks on a polygon then it looks like a brick wall. The problem with using plain texture maps is that they don’t react to light the way their surface might suggest. If you shine a light down the face of a brick wall, you won’t see the light hitting the tops of bricks and the undersides being dark. The illusion of a brick wall is broken and it suddenly looks like a wall with brick-patterned wallpaper. Which is pretty much what it is.

But if you use a bump map, you can get those bricks to be lit the way the eye expects, and it can feel like the wall is actually bumpy. It takes another whole texture to pull this off, but that’s pretty cheap compared to the visual gain. Here’s a couple of screenshots I took in Doom3 BFG edition:

Left is without bump maps. Right is normal view. Note the keyboard and face. Click for LOLHUGE! view.
Left is without bump maps. Right is normal view. Note the keyboard and face. Click for LOLHUGE! view.

“Baking normal maps” is a process where you let your artist go crazy and make some million-polygon model. Like, they can model the individual creases in the monster’s skin, the little rivets on the armor, everything. Then you run a program to smooth out all those little bumps and turn the million polygon model into a thousand polygon model. But! You take all those bumps you sanded off and save them on a bump map. Again, refer to the keyboard in the screenshot above. The bump map makes it LOOK like there are individual keys sticking up. You’d have to get your eyes level with the keyboard to see that the key don’t actually protrude from the surface. It’s just a lighting trick.

37:00 “Separate memory buses are the way things have been since the beginning.”

There’s your base computer. Then you’ve got a graphics card plugged into it. Each one has its own bank of memory and they can’t share. On the consoles, this is not the case. On the next-gen consoles (and also the Xbox 360 from the current generation) there is just one big pool of memory for the programmer to use as they like.

This becomes important because while the computer is fast, and the graphics card is fast, the way they communicate is kind of slow. So if your game is drawing to a texture, then you have to send the texture to the graphics card, then do some operations on it, then maybe pull it back over to the computer and do something with the results. Moving the texture back and forth like this is messy and expensive, time-wise. It’s not that consoles are faster at doing this, it’s that they don’t need to do it at all!

Comments (34)

  1. Cuthalion says:

    Awesome. Thanks for the annotations. I listened to the talk at work last night and managed to follow it pretty well until the second half, where I sort of barely understood it. Looking forward to more enlightenment.

  2. Alex says:

    I like listening to the talks and reading your annotations.
    The funny thing that Carmack did not mention though is that we don’t really have the tools yet to work with 16 or 32 bit normals. Yes Photoshop will take them, but the amount of stuff that you can actually do with them is rather limited(not as many filters or layer effects). At that point we might as well go over to object normals rather than tangent space. Also speed while working with a layered 16 or 32 bit texture might be a bit of a nightmare, depending on how intensive you are with your layers.

  3. I’m fairly sure you’ve got your last point completely backwards there Shamus – All current and future consoles and PC’s (Except I believe the PS3?) use isolated memory for the GPU and the CPU. Carmack is making the point that this really isn’t necessary anymore and creates a bottleneck between the GPU and CPU when data needs to be manipulated by both components as it needs to be copied between each isolated memory location.
    Carmack believes, and rightly so, that GPU vendors need to drop their memory entirely and directly access the memory you stick into your motherboard.

    • Bryan says:

      Unless I’m missing which part you’re replying to, that’s exactly what Shamus said, though. “Each one has its own bank of memory and they can’t share. … Moving the texture back and forth like this is messy and expensive, time-wise.” :-)

      The consoles that “don’t need to do it at all” are the next-gen ones and the xb360, if I understood the first paragraph correctly…

    • Phill says:

      All current and future consoles and PC's (Except I believe the PS3?) use isolated memory for the GPU and the CPU.

      The PS4 certainly has a unified memory model – there is a straight 8Gb of memory that is directly accessible to both the CPU and the GPU. This is one of its main performance enhancing features in fact, that you don’t have to waste time copying texture data from main memory to the GPU memory.

      Not sure about the Xbone 180’s architecture; I believe it is primarily a unified memory system like the PS4 but with some wrinkles, although since I’ve not had anything to do with it yet it’s all very vague to me. Also they are using different types of memory (GDDR5 for the PS4 vs DDR3 for the Xbone, which affects how they can do certain tasks).

      • Peter H. Coffin says:

        Considering how boulder-strewn the “normal” x86 memory mapping and allocation is to support features written all the way back to “640k should be enough for anybody” days, the “a few wrinkles” in xbone’s mapping are close enough to unified for Windows programming… *grin*

        • Bryan says:

          Having looked at an e820 map (the thing where the OS asks the BIOS what physical address ranges are of what type), yes, totally agreed.

          BIOS-e820: [mem 0x0000000000000000-0x000000000009e7ff] usable
          BIOS-e820: [mem 0x000000000009e800-0x000000000009ffff] reserved
          BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
          BIOS-e820: [mem 0x0000000000100000-0x00000000cd847fff] usable
          BIOS-e820: [mem 0x00000000cd848000-0x00000000cdb29fff] reserved
          BIOS-e820: [mem 0x00000000cdb2a000-0x00000000cde79fff] ACPI NVS
          BIOS-e820: [mem 0x00000000cde7a000-0x00000000cea3dfff] reserved
          BIOS-e820: [mem 0x00000000cea3e000-0x00000000cea3efff] usable
          BIOS-e820: [mem 0x00000000cea3f000-0x00000000cec44fff] ACPI NVS
          BIOS-e820: [mem 0x00000000cec45000-0x00000000cf06dfff] usable
          BIOS-e820: [mem 0x00000000cf06e000-0x00000000cf7eefff] reserved
          BIOS-e820: [mem 0x00000000cf7ef000-0x00000000cf7fffff] usable
          BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved
          BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
          BIOS-e820: [mem 0x00000000fec10000-0x00000000fec10fff] reserved
          BIOS-e820: [mem 0x00000000fec20000-0x00000000fec20fff] reserved
          BIOS-e820: [mem 0x00000000fed00000-0x00000000fed00fff] reserved
          BIOS-e820: [mem 0x00000000fed61000-0x00000000fed70fff] reserved
          BIOS-e820: [mem 0x00000000fed80000-0x00000000fed8ffff] reserved
          BIOS-e820: [mem 0x00000000fef00000-0x00000000ffffffff] reserved
          BIOS-e820: [mem 0x0000000100001000-0x000000042effffff] usable

          Why so much “reserved”? I do recognize several of those physical address ranges, though, from back in the days of DOS. (e0000 through fffff, in particular, used to be ROM BIOS I believe. That the last 6K of 0-0x9ffff is reserved is surprising, too.)

          Also, where did the other ~50M of memory go, from this 16G system? Adding up all the “usable” lines only gives 15.949G. (Better than when I started poking at BIOS settings. Used to be ~10x that.)

          Luckily, nobody except kernel developers has to deal with this; we all get a flat 4G or 16E address space (depending on pointer size)…

    • Alexander The 1st says:

      IIRC, you’ve got that situation different; the main reason PS3 games were hard to develop for memory-wise as opposed to the XBox 360 is because, of the 512 mb of RAM both had each, the PS3 would only allow 256 mb for graphics, and 256 for engine-related stuff.

      Which caused the problems that Bethesda had, for example – having to dump stuff from the engine’s RAM much earlier than the XBox 360 did, because the video RAM just wasn’t being maxed out and yet couldn’t be used for the engine.

  4. Adam P says:

    Bump mapping and normal mapping are not interchangable.

    A bump map is basically a height map. You can see a heightmap on this post from Project Hex. Like height maps, bump maps are grayscale images. However, a height map is typically used for displacement. Bump maps don’t displace the geometry, it just informs the GPU where lighting should be faked. Simple highlights and shadows, basically. Fingerprints instead of folds in cloth.

    A normal is something Shamus covered while working on Project Octant. These are the mostly-blue images we all know and love. If you don’t know the blue images, they’re quite lovely (if a bit sad). A normal map fakes geometry by telling the GPU how light should be bouncing off of an object. This is useful for things like keys on a keyboard which are supposed to have four sides and a top; if Shamus stood a little to the left or right in the screenshot above, the highlights on the keyboard would look different.

    Bump maps can be used to fake normals though, but that requires the GPU to do more work by calculating the normal on-the-fly. Normal maps skip that calculation, but use three times the memory.

    To complicate things, both bump maps and normal maps fall under the category of “bump mapping.”

    • Zukhramm says:

      They basically interchangeable. They’re both describing the same actual data (a surface) and going from one to the other is relatively easy (I guess, disregarding discretization). They are essentially the derivative/integral of each other, trading what information is easily accessible, and space vs time. I would not call using bump maps to calculate normals “faking it” any more than any computation is “faking it”.

      • Karthik says:

        How is one the integral/derivative of the other?

        • Zukhramm says:

          If you have a normal you have the tangent plane. So you’ve got two sets of data, one defining the elevations of a surface, and one the slopes. The normals can be calculated with using the gradient, but of course, with discretized data that is not described by a function you’re probably doing it numerically instead.

        • Volfram says:

          A bump map denotes location, a normal map denotes slope, or rate of change of that location. The normal map is the derivative of the bump map, the bump map is the integral of the normal map.

          (I hope that’s less opaque)

    • Volfram says:

      That’s basically the “more complicated but not worth getting into” part. Bump and Normal maps fulfill the same purpose, they just go about it using two different techniques.

      I’m pretty sure Parallax mapping(which changes how a given portion of a surface is actually drawn based on the camera perspective) works best with bump maps. There was a Parallax mapping plugin for Doom 3 shortly after it came out, and one of the comments the modder said was that it vastly improved his ability to pick out details in low-light environments, meaning he didn’t need his flashlight as much.

      Zukhramm’s replies are mostly correct. You typically generate normal map data from bump maps, and it’s possible to generate bump map data from normal maps(integral and derivative). Of course, doing this in-system is slow, and normal maps provide a generally better lighting model than bump maps do.

  5. Clint says:

    > A texel is a “texture pixel”, and in this context it's basically a measure of how fast the graphics crad can fill up the screen with all the stuff it's trying to render.

    “Crad” should probably be “card”.

  6. Tse says:

    Wow, I never realized how low the polygon counts of Doom 3 were. I am used to doing high-poly work for visualization, so I still get a little surprised by what you can achieve with just a normal map.

    • Volfram says:

      The armor for Master Chief’s MJOLNIR armor in Halo 2 actually had fewer polygons than the Halo 1 version, but it looked more detailed in part due to Bungie’s extremely effective use of normal maps and lighting effects. Never underestimate what a good texture shader can do for you.

      What I really like is that on the kiosk monitor, the normal map smooths down near the edges, so that it doesn’t suggest a different silhouette from what the polygons present. And yeah, it’s funny that the un-mapped versions effectively don’t look any different from, say, Deus Ex 1.

      The baking process, by the way, was probably developed independently by several different studios. I first heard about it from a group called Diversions Entertainment, when they were describing a technique they’d developed for mapping data from a ludicrously high-detail model onto a much-simplified one.(the example given was a simple cylinder being used to represent a soda can with the top lip, a dent in the side, and water condensation. It would have been even cooler if they’d had a side-by-side comparison in the article.)

      [edit] I just took a closer look. My FAVORITE part is where the hands of the guy at the computer are done the same way as hands in the PS1 game Battle Arena Toshinden: thumb, index finger, “fingers.” That is some really impressive shader work.

  7. Zak McKracken says:

    For an article focussing on pixel-peeping image quality issues, that Gamasutra thing sure uses too highly-compressed jpegs… most of the problems with inmage quality in that article are from using jpeg compression. Especially in the menu screenshots. Ever heard of PNG?

    Related question: Would it be possible to use (for example) PNG compression for textures, or is DXT hard-wired into graphics cards these days?

    • Cuthalion says:

      Well, Battle for Wesnoth uses pngs for its sprites, as do I, but I’m not specifically communicating with the graphics card, and I have no idea if Wesnoth is, either. And in both cases, I believe it’s just 2d images on their own, rather than textured onto a 3d model. So I’m not sure if that means anything.

      • Carlos Castillo says:

        W.R.T. Wesnoth, and other games that use PNG files, it may be the case that one of PNG’s strengths (lossless, smaller) is desired, or it may also be the case that PNG is an open format with many good, free, and fast compressors, while DXT compression is often very asymetrical (slow encoding vs decoding) and has expensive tools (the good ones at least).

        Similar situations have arisen in other game-related fields:

        Bink Video for the longest time was best way to have cross platform, efficient, small FMV files, but it costs a licensing fee to use, and encoding is terribly slow, so it was used mainly in AAA titles. It still is top dog in it’s field, but it may be displaced soon by a royalty free format such as VP8/VP9 made by Google, which have continually improving encoders/decoders.

        Ogg-Vorbis has already done so for audio in AAA games, supplanting the use of MP3. Opus may do the same in turn to Vorbis as its measuably better, still royalty free, and can also be used for realtime audio chat (Vorbis, like MP3 imposes a huge latency).

        • Volfram says:

          AGH! I hate Bink video so much!

          Far too many games rendered unplayable because of codec problems.

          I will have to look into Opus. I’m currently using .ogg files for audio for a couple of reasons, and that’s not likely to change for this project.

      • HiEv says:

        You’ve missed Zak’s point entirely. He wasn’t talking about a game, he was talking about the Gamasutra article that Josh linked to.

        The example images in that article are terrible, for multiple reasons.

        The first image attempts to show how bad DXT affects simple text, however the image is only a tiny bit blurrier, barely perceptibly IMHO. And worse the “i” in “compression” is somehow a little bit clearer in the DXT version.

        The next image shows a “comparison” where the DXT version has had its differences exaggerated by manipulating the saturation and brightness, so we never see a true comparison of what the difference would be normally.

        Those images were GIFs, which is fine for images with a few colors where it will remain a lossless image compression format, as GIF is limited to 255/256 colors (depending on transparancy). However, the rest of the article uses JPEG, which is a lossy image compression format. This means that the rest of the images in the article are misleading, because we’re seeing JPEG compression on top of any image, DXT compression or no, leaving only misleading comparisons.

        If you want to accurately compare two images, especially when comparing compression damage, you need to use PNG. PNG is a lossless image compression format that handles far more color depths than GIF.

        Because of all of that, the article’s sample images are very misleading.

        I came here to rant about this a bit, as it’s my area of expertise, but Zak beat me to it.

        • Cuthalion says:

          We’re agreed that the article example images are terrible, for exactly the reasons you stated. I was responding to just this part:
          Related question: Would it be possible to use (for example) PNG compression for textures, or is DXT hard-wired into graphics cards these days?

        • Volfram says:

          Yeah, about the part where they were using the picture of the terrorist I couldn’t tell the difference, because his leg looked like one giant JPEG artifact to me. That was worse than any of the DXT artifacts they were trying to point out.

          Article was just put together poorly overall. I’m not sure about the content, I stopped reading when I noticed the problems you pointed out in the pictures.

    • Carlos Castillo says:

      The two image formats you’ve listed DXT vs PNG have very different properties. Note that there are other forms of texture compression formats (eg: S3TC) but I will be using DXT as a shorhand for all texture compression algorithms. Similarly PNG is used as a shorthand for all lossless image formats (eg: WEBP, BMP, GIF).

      PNG images are lossless (no image degredation), and are meant to compress image data as small as possible (but in this case still be lossless) to be loaded from a disk. It uses algorithms (essentially the Deflate algorithm used by zip files) to determine redundant information in the pixel data.

      DXT images are lossy, but have a fixed compression ratio (4:1, 6:1, etc…) depending on which version you use. The different versions of DXT are specialized for certain types of images (texture maps, normal maps, etc…). Two big advantages of DXT are that the compressed data is a known (fixed) size, and that the algorithms to decompress them (but not necessarily compress them) are simple (ie: fast) and take a fixed amount of time, meaning that GPU makers can build a cheap chip to process it.

      Graphics Cards (when producing graphics) can only handle raw pixel data, or DXT as input, because anything else would take an unknown amount of time to process. GPUs operate by having many small parallel pieces work in lock-step, so if it took a variable ammount of time to decompress image data, you were stuck with processing a batch of pixels at the rate of the slowest pixel.

      Aside from that technical limitiation, there are two other reasons to use compressed textures.

      First, if you have an unknown compression ratio, it makes memory “lost” due to data fragmentation much more likely, since textures must exist in contiguous blobs to be accessed as efficiently as they are. Files on your Hard Disk fetched by the OS can be chopped into pieces to fully use your disk’s capacity, but this operation means data retrieval is a variable-time affair.

      Second, as mentioned by Shamus in the article, textures must be moved from one set of memory to the other, more so in older hardware where GPU memory is isolated from main CPU memory. Since the GPU only handles Raw pixel data or DXT compressed data, moving DXT data is significantly faster (because it’s 2-8 times smaller). Even with unified CPU/GPU memory that memory still needs to be accessed by the GPU’s much faster processor cores, so loading the data for X times as many pixels (because it’s compressed) can still be a significant win, if the GPU can turn the data into the pixels it actually needs faster then loading X times as much data.

      In the future, with more platforms unifying more of their memory (as Carmack envisions), it may become more efficient to PNG encode data on disk, load it with the CPU into RAM, and access that memory with the GPU. This may in fact be a distinction between the PS4 and Xbone, since the PS4 uses faster memory (in general), the speed advantage of using DXT textures may be less important for it then the Xbone.

    • Volfram says:

      Yeah, the first couple of comparisons were very clearly “Oh yeah, that DXT example looks TERRIBLE!” but the later pictures I eventually gave up because the source material was too muddied for me to tell the difference between the “good” and “bad” image.

      FYI, I am also using PNG files for the game I am working on. At least, the textures which aren’t dynamically generated… In my case, it lets me include transparency directly into the file.

      The way I see it, for spritework, you really only have 2 file format options: PNG and GIF. BMP is uncompressed and entirely too large, and JPG will add noise that can’t be accounted for, and can’t do transparency.

      • Carlos Castillo says:

        Generally speaking, for a sprite based game, you won’t be taxing your GPU in any meaningful way, or at least in ways that texture compression will show significant benefit. In that situation, you could then choose an image format which would have benefits for development (ease of use, lossless) and distribution (size, licensing), instead of raw performance on the user’s machine.

        When you use an image format that is not supported by the GPU, you are essentially decompressing it “yourself” with the CPU, and then loading that raw-uncompressed data to the GPU.

        Also, it is possible to have any image format with no-alpha channel support transparency, you just need to load two images (one grayscale that acts as alpha data), and combine them yourself, either before hand on the CPU or at draw time on the GPU (in a shader). Formats that support transparency just make having transparent textures much easier to implement.

        PS: I know that it is possible to have your graphics API (OpenGL, DirectX) compress the textures as DXT after loading them from disk, but before sending them to the GPU, but those compressors suck, mostly because they need to be fast, and the better algorithms are usually proprietary ones.

        • Volfram says:

          So basically on the “good formats for spritework” end, I was right but more by being lucky than clever.

          I can live with that.

          I’m aware that an alpha mask can be used in lieu of an alpha channel, and in fact a decade ago that’s the only way I knew of to do it. The image library I wrote for myself can also split and recombine the 4 channels in arbitrary order, from nearly any dataset. As you said, though, having an image format that supports alpha channels is just so much easier.

  8. Maryam says:

    Although I’m not actually watching the videos, I like reading these posts anyway. I’ve never actually known what a bump map was. Thanks for explaining, Shamus.

  9. Chamomile says:

    Two things:

    The 28:30 timestamp has the period after the quotation marks, when the rest of them have the period inside of the quotation marks.

    At the 33:00 timestamp you execute a bunch of POWs, which is a violation of the Geneva Conventions.

  10. Kdansky says:

    I think he talked about this in this part too: Refresh rate.

    If your game runs at 30 Hz, or at a flickery 60 Hz (but with VSync off and tear lines), it will look so much worse than at a fluid 60 or even 120 Hz, even at lower resolution, details or effects.

    To test that out: Load your favourite current game up, and set AA to the highest setting (x8, usually) you can find, switch Vsync off and play a bit, and then pull it all the way down to 2x, but add Vsync and compare. Note: This won’t work for Skyrim, because its Vsync implementation results in catastrophic mouse lag.

    Dark Souls is a great example to demonstrate how ugly non-Vsync can be, because you can either have 30 FPS (fixed), or 60 FPS (with unbelievable tear lines) when you add DSFix.

  11. Michael Pohoreski says:

    GTX Titan scaling is all OVER the place. With some games having 4 titans the performance goes DOWN. i.e. BF3 1920x1080p.

    If you are strictly GPU bound (such as using CUDA) then yes, 4-way has linear scaling. Loading balancing CPU+GPU is extremely tricky to do fast. At the moment BOTH AMD and nVidia have extremely poor 4-way scalability for games.


Leave a Reply

Comments are moderated and may not be posted immediately. Required fields are marked *


Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun.

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>