Frontier Rebooted Part 4: Stuck in a Rut

By Shamus Posted Sunday May 18, 2014

Filed under: Programming 87 comments

I’ve implemented erosion simulation a few times in my career. My usual technique has you begin at a single spot on the map, landing like a virtual raindrop. From there, you examine the immediate surrounding points and see which one is lowest. Then you move there and repeat the process. As you go, you shave a tiny little bit off of each point. If you want to get fancy, you look at the steepness of the slope. If you’re going on a nice downhill then you assume this theoretical trickle / stream / river is moving fast. You might pick up a little extra dirt (thus making the riverbed deeper) and taking it with you. When the land levels out, you assume the flow has slowed and you drop off a bit of what you’ve collected.

And if the erosion code doesn’t pan out, maybe we can repurpose it into a sled-riding simulator.
And if the erosion code doesn’t pan out, maybe we can repurpose it into a sled-riding simulator.

Eventually you hit the ocean or find yourself at the bottom of a hole. Here you drop off whatever you’ve collected, which ought to help form sloping beaches, river deltas, and gradually fill in craters. At this point you’re done. Now pick another point on the map, drop another raindrop, and start the whole thing over.

It’s not perfect. This would be useless to a geologist or other science-type person trying to study science-type stuff. But it’s perfectly good for making a plausible landscape.

This is nice, but shaders simply can’t do this kind of processing. You can’t say, “I’m done with grid coord X, Y, now let me move next door to X+1, Y”. You can’t just store values globally, and you can’t stop processing when you feel you’re done. (Unless you don’t want to generate any output. In which case you just wasted your time.) When your shader is executed, you’re dropped into a situation where you’re expected to do the calculations of one pixel, and only that pixel. You can’t change any adjacent pixels and you can’t carry values between them using variables. You can LOOK at adjacent pixels, but because processing is heavily parallelized, you can’t even guarantee that points will be handled in any particular order.

So how do we handle large processing jobs like this, where we have a lot of inter-dependency and changes that need to propagate in unpredictable ways?

There are probably a lot of solutions, but the system I came up with requires one new texture (for holding erosion data) and two new shader passes.

So our new texture is called the “erosion map”. We look at whatever pixel we’re given, and check the heightmap for its immediate neighbors to the north, south, east, and westI experimented with using all 8 ordinal directions, but to my surprise it looked worse.. We look at which one is lowest. This means there are five possible outcomes: north, south, east, west, and none. (“None” happens if we’re lower than all our neighbors.) We jam this value into the red channel of our erosion map.

Next we store the number 1 in the green channel. I’ll explain that in a minute.

Next we look at the red channel of our four neighbors and see how many of them are pointing at us. If they are, then that means they are tributaries of us. We take their green channel and add it to our own. This means the green channel is effectively the number of cells upstream of us. If we’re at the exact pinnacle of a mountain, this will be 1. (Remember we set the green channel to 1 a couple of paragraphs ago.) If we’re at the bottom of a valley then we could have hundreds of tributaries. (Although remember that since we’re saving this to a one-byte color channel, we can only hold 256 unique values. Which means that we basically lose track after 255.)

Doing things this way is a bit odd. If we have a chain of tributaries 100 units long, it will take 100 updates for everything to fully propagate.

After we update the erosion map, we do another pass and update our heightmap. The heightmap looks at the erosion map. The more tributaries a point has, the more elevation is eroded away. Then after we update the heightmap we have to update our normal map again, since the topography has changed a bit.

The erosion has begun to work its magic. That river was carved by our simulation. You can also see ridges forming.
The erosion has begun to work its magic. That river was carved by our simulation. You can also see ridges forming.

So we have a nice little chain of updates going now. 1) Update the erosion map to see where erosive forces exist. 2) Update the hieghtmap to apply those erosive forces. 3) Update the normal map, since the shape of the hills have changed. And since the hills have changed, we need to go back to #1 again.

The texture for our world is pretty big: 2048×2048. That means the heightmap, the normal map, and the erosion map all need to be that size. The erosion processing is pretty expensive. If we tried to update the whole thing in one frame our framerate would tank. So we update it in patches, spending 16 frames updating the whole thing. Then – just to keep things nice and unified – we spend another 16 frames updating the heightmap, and another 16 doing the normal map. So the whole cycle takes 48 frames. Since we’re running at 60fps, this means the erosion system runs at a little better than a frame a second. That’s fine. The changes it makes are incredibly slight. If they were big, the terrain might feel too chaotic, and it wouldn’t last very long. The mountains would melt away into speed bumps and (since we don’t have any other geologic forces in play) the lowlands would flatten out to an immense and very boring planeWhich would technically also be a plain..

Instead of forming gentle valleys, it’s gouging ever-deepening ruts.
Instead of forming gentle valleys, it’s gouging ever-deepening ruts.

It’s not awesome. It’s too brute a force. Once some erosion (we can think of it like water, although no water is actually depicted) gets down into a crevice, it tends to just dig deeper and deeper, forming these ugly trenches. Also, these trenches have a habit of lining up with the grid and then going perfectly straight, which makes them look just awful.

This was a single mountain, but the rivers have carved everything away, leaving these sheer vertical knife-ridges.
This was a single mountain, but the rivers have carved everything away, leaving these sheer vertical knife-ridges.

So I add a new feature. While the erosion pass has a cell examining its neighbors, I make an average of all of the green channels. (That’s the tributary count.) I stuff this value in the unused blue channel. This is basically “tributary count, except blurred”. This value is now used for erosion when we update the heightmap.

Gentler, more realistic hills. The nasty seam is due to a bug that I eventually got around to fixing. The world is supposed to tile seamlessly. During the normal pass, it wasn’t wrapping values properly when it got to the edge of the world.
Gentler, more realistic hills. The nasty seam is due to a bug that I eventually got around to fixing. The world is supposed to tile seamlessly. During the normal pass, it wasn’t wrapping values properly when it got to the edge of the world.

This is basically good enough. It rounds off the sharp edges. There’s a lot more we could do: Maybe we could mess around with river banks, making sure they sought a plausible angle of repose. Maybe if the erosion value of a particular cell got too high then it would no longer be considered a good place to flow, which would make river beds wider when they had a large number of tributaries. We could fiddle with this stuff all day, but we’ve basically met my initial goal of making a sophisticated, multi-pass process run on the GPU.

Close enough, I guess.
Close enough, I guess.

It’s interesting the things you learn like this. It’s the kind of stuff that might not appear in the docs. It’s the stuff you sort of work out through trial-and-error as you learn, and then take for granted when someone else comes along asking questions.

Notes / comments so far:

  1. I got a sense for how much work we can expect to get done in a single frame. Right now doing erosion is pretty expensive, and updating our texture in 256×256 or 512×512 chunks is roughly the sweet spot on my machine. This doesn’t mean I know how it will run on other machines, or tell me how other shaders might perform, but at least I have SOME frame of reference.
  2. I got a sense of when the GPU starts clamping and truncating values: Only at the last possible moment. You can have nonsense color values (like negative numbers, or gigantic values) or non-normalized normals, or other data that’s supposedly invalid. The shader programs will tolerate whatever nonsense you throw at them until the moment you draw a pixel to the screen.
  3. Converting between floating-point values and single-byte values is safe and reliable. You can take the extracted color value (which runs from 0.0 to +1.0) and multiply by 255. Treat it like an int, do whatever you like with it. As long as you divide by 255 before to send the color value to the screen, you’ll get out exactly what you put in. I wasn’t sure if there was other clipping going on that would make this difficult, but this entire erosion system would have collapsed if this conversion was the slightest bit unreliable.
  4. Having said that, I really wish I could query the color values directly as ints. Right now the GPU looks up the color value, converts it to a floating point value, then I convert it back to an int so I can use it, then I convert it back to a float so GLSL will accept it, then the GPU converts it back to a single bytle value. We’re doing four conversions, when zero would do.

We’re not done yet. More shader shenanigans to follow.

 

Footnotes:

[1] I experimented with using all 8 ordinal directions, but to my surprise it looked worse.

[2] Which would technically also be a plain.



From The Archives:
 

87 thoughts on “Frontier Rebooted Part 4: Stuck in a Rut

  1. Tse says:

    Hm, maybe momentum of the water should be taken into account. Rivers tend to straighten up with time, especially when more water has collected.

    1. Norman Ramsey says:

      Sometimes rivers turn into oxbows.

    2. Paul Spooner says:

      If the steepness of the terrain is taken into account, you could do this by averaging the normals of the immediate tributary cells and using that vector to weight where the downstream erosion goes. Maybe make another texture to store this new normal map, and cache it for convenience, since it isn’t going to be changing much over time.

    3. Arkady says:

      I seem to remember being taught that rivers get twistier over time. The outside edge of a curve has faster moving water and erodes more, while the inside edge has slower moving water and so deposits more. Thus the river gets windier and windier forming meanders.

      1. Depends on the slope, I think. Steep fast rivers get straighter, flat slow rivers meander, turn into oxbows etc.

    4. TMTVL says:

      If you’re going to bother with hydrodynamics, then there’s little reason not to add geology as well.
      Different types of soil and rock erode at different speeds, tectonic plates push mountains up, volcanic activity can create or enlarge islands,…

      1. Abnaxis says:

        Correct me if I’m wrong, but my understanding is that geologic formation happens on a vastly longer time-scale than erosion.

        Looking at it this way, the original noise that generates the height-map is supposed to be the simulation of tectonic plates and volcanic activity. It is a “given,” as the base terrain that comes from the millennia of floating rock-plates jostling each other around. The erosion code then simulates the centuries of water and wind that eat away the exposed rock–a long time, but still a blink of an eye in geological time.

        Fundamentally, that’s why this works and looks good without going into super detailed simulation. While it might be more realistic to try and simulate all that geological stuff, it looks at least as well to the naked eye when you pick a proper function to generate all the mountains randomly, and it’s a lot less computationally intensive. And honestly, that poor GPU is stressed enough…

        1. Aitch says:

          Still, I doubt it will ever look completely correct without different erosion rates for different strata. Sounds like a tough problem to work around, but as it stands everything seems to act like dirt instead of types of rock….

          Btw, I love these posts. It totally blows my mind what Shamus can do with this stuff, and saddens me deeply that it isn’t being properly utilized in the mainstream. Even the random tree generator, for example.

        2. Moridin says:

          “Correct me if I'm wrong, but my understanding is that geologic formation happens on a vastly longer time-scale than erosion.”

          Depends. Himalayas for example are raising at about the rate of 10mm per year, which isn’t all that much slower than erosion except perhaps that caused by fast-moving water on a soft soil.

          1. Yeah. I think it would be fair to say that in most places, most of the time, erosion is happening much faster than geologic formation. But on the other hand, when the geological stuff gets active it can be far the other way around. That’s why everything isn’t flat.
            In the Himalayas you’ve got two plates still in the process of smoorging together, so it’s squishing up those mountains at a decent clip.

  2. Cuthalion says:

    Oooo, I like it. The first erosion shot was a pretty cool example.

    Now, here I am as one of the first few posters, and nothing interesting to say!

    How do you decide where to throw the raindrops? Uniformly? Randomly? In sweeps? Does the landscape end up looking different if you change that?

    1. droid says:

      This GPU implementation does one raindrop per cell, simultaneously updating all cells at once. A CPU implementation might just choose randomly and run one drop at a time.

      1. Geebs says:

        Yeah, trying to do things using random numbers in the GPU is very expensive in terms of cycles and a totally wasted effort for simulating rainfall (and only ever pseudorandom). In the past I’ve tried generating textures of random numbers on the CPU and uploading them to simulate patterns of raindrops, and it made no difference beyond ruining the framerate. Multiplying your rainfall value by height or the direction of the normal (to simulate prevailing wind) is easy enough, though.

  3. sensi277 says:

    Erosion is cool and all, but the more features in this project, the better. Remember, this isn’t going to be some sort of commercial product, this is more of a learning experience for Shamus so that he can continue to make a different project that is (probably) going to be a commercially sold game. More features means more things to learn. More things to learn means a better understanding of the shader language which means Project Good Robot becomes the best it can possibly be.

    1. Jay says:

      I’m pretty sure Good Robot’s problem isn’t ugly terrain. GR’s problem is that it isn’t fun. The solution to that involves a whole lot of tweaking of level designs and enemy behavior to build a gradually rising challenge, with appropriate tension spikes, breathers, and enough variation to stay fresh.

      Judging by the number of technically-proficient-but-not-actually-fun games on the market, coding the basic functionality is the easy part.

      1. Felblood says:

        Yes, yes; this is the end goal.

        However before he can start gathering the useful amounts of playtest data to tweak those levels, he needs to get the graphics engine to work on his testers’ machines.

        That means figuring out what is the problem with his shaders and fixing it.

        Which, in turn, means learning more about the guts of the shaders, hence this learning project.

        There are a lot of projects that got published before they were really ready because the team ran into something like this, and opted not to invest this kind of time to take the problem apart and find a way to attack it like this.

  4. MichaelGC says:

    D’oh. I was looking forward to Plain Plane Simulator 2015.

    1. lucky7 says:

      What definition of plane are you using?

      1. ET says:

        Obviously, it’s supposed to be Plain Plain Plain Simulator 2015. Duh. :P

        1. Ravens Cry says:

          You might even say it is plainly Plain Plain Plain Simulator 2015.

  5. Ilseroth says:

    Reading this, it makes me tempted to learn how to build my own engine.

    I have been working on the unity engine the past few days, figuring it out, and the restrictions from using it is… frustrating.

    Granted as I mentioned, only been at it for a few days and prior to that I haven’t touched coding in a long time (and I’m not particularly proficient at modeling, rigging, animating, or texturing either.

    Implementing animations into Unity is irritating the hell out of me; though blender isn’t exactly playing kind either. Spent 5 hours this morning fiddling with an animation and having my model suddenly flip around psychotically when I tried to add a second animation state (which I had done a few days prior with no issue)

    1. megabyte says:

      Hang in there! I haven’t worked with unity as much as you have, but I know it has its idiosyncrasies that take time to learn and overcome. I nearly pulled my hair out trying to get Eclipse to compile and run my android programs for class.

    2. Moddington says:

      A koan for you:

      A programmer once had a problem.
      His tools were slow, restrictive, and unreliable.
      So he said to himself, “I’ll write my own tools!”
      Now the programmer had two problems.

      1. Alexander The 1st says:

        Related, since game engines are apparently trying their hardest to become the standard for game development at this point.

  6. Decius says:

    Avalanche: If the gradient is too steep, fall down, and iterate.

    1. Paul Spooner says:

      Vegetation: Arrests erosion, but doesn’t grow where there’s too little water, or too MUCH water. Also gives you an excuse to make a GPU procedural tree generator.

      1. rofltehcat says:

        Type and size of vegetation could also be influenced by gradient and amount of water. Plus altitude.

    2. Ah, yes, that’s what this thing is missing: Talus slopes.

  7. Paul Spooner says:

    Ooh ooh ooh! Erosion isn’t dependent on number of tributaries alone, but also on the current steepness of the terrain (as you say in your initial description)! Fortunately, you have access to this, in your normal map! The steeper the terrain, and the more tributaries, the more erosion occurrs. Conversely, the more level the terrain (and the more tributaries) the more deposition happens. I can’t tell if you’re already taking this into account (your initial description seems to indicate that you are), but if you aren’t, it should be an easy addition.

    It is also dependent on the “hardness” of the terrain. I would toss the blue “blur” value, and turn it into a “hardness” number. Maybe populate it initially with some random strata (based on the initial height of the terrain…

    ooh! Even better! Keep a copy of the initial terrain height for reference. Generate a list of strata (some hard, resisting erosion, and some soft, eroding quickly). If the terrain height is above the initial terrain height, this means it is loosely deposited material (silt, sand, etc) laid down during the erosion simulation, and easily eroded. Otherwise, it’s “original material” and uses the strata hardness map, or however you want to go about it. You could make the strata completely level, or make them offsets from the initial terrain height. Lots of interesting geological stuff will come out of this.

    Bonus points, use the hardness as absorbtivity as well, so that softer surfaces absorb water, reducing the effective number of tributaries, and rapidly accumulating deposition. This just might give you the “winding river” thing that happens in nature, as rivers choke on their own silt and change course.

    A final change you could make is to have the “number of tributaries” value stored by exponential instead of linear. Since you care about a large number of points, and only need a rough count, you could have 1=1, 2=4, 3=8 and so on.
    Where X is the tributary value, and N is the integer value stored in the texture:
    Currently: X = N
    Proposed: X = 2^N
    You’d have to decide how to round the numbers (does 2+4 equal 4 or 8?) but it seems like a valuable improvement. Ooh! Golden ratio! Instead of using 2 as a base, use 1.618, so that
    X = 1.618^N
    This way rounding becomes trivial, and you still get the benefit of exponential storage without loosing too much granularity.

    1. Felblood says:

      For simplicity/less need to store data, you could declare that the hardness of the soil is proportionate to the reciprocal of the difference between the original height and the current height (literally divide new erosion values by the total of all past erosion).

      This should be cheap, but still yield terrain that feels natural and organic. You can get the rough shape lined out in just a few brutal passes, and then it automatically becomes more gentle as the time comes to work out the finer details.

      That should make it easier to create terrain that doesn’t look like the crazy rock formations of the American Southwest.

  8. Zukhramm says:

    Seems like fun stuff. I might try following along at home as far as I can.

  9. Volfram says:

    Please forgive me if I say anything wrong here, as I’ve never done GPU programming, but…

    The original algorithm basically drops a raindrop on each point, then moves it around until it can’t move anymore, at which point it deposits what was previously picked up.

    What if…

    You could use an RGB texture, store the current X and Y values as R and G. Start them at the initial raindrop point. Every time a raindrop passes over a vertex in your heightmap, reduce the value a little bit, pick the lowest neighbor, and update the R and G values, repeat for 256 iterations or until an autodetect system determines there are no further places to erode to.

    I’m not sure that would be faster or better than what you’ve currently got. Just tossing ideas out. It’s interesting that the new system would work so oddly, since the last time it was implemented, it worked really well.

    Maybe you could also tell the simulation to stop after a certain number of steps, or the value moved(amount of height lost from the “from” and deposited on the “to”) could be proportionate to the height difference between the two. Maybe use the B channel to store material scooped up, and it scoops up more on steep slopes and deposits more on shallow ones. Maybe you could use Alpha as a “speed” value, which increases for steep(over 45 degree) slopes and decreases for shallow ones, and use that to determine how much material is scooped up and put down?

  10. Csirke says:

    I only have some experience with shaders, but I wouldn’t trust note 2. unless it’s specified somewhere, because that definitely sounds like something that will break first time you try it with different configurations.

    I mean I don’t know what you mean exactly by “color values”, of course vec4 variables can have any valid float values, if that’s only what’s happening here.

    EDIT: As for 4., I think you just need to change your sampler2D or equivalent to usampler2D, to get unsigned integer values 0-255. (Or “i” for signed.)

    1. The Snide Sniper says:

      Note #2 is valid because every variable is floating point.

      Read from a texture? vec4 (of floats)
      Output color? vec4 (of floats)
      Everything in between (unless you do something silly)? floats.

      I would not, however, trust that you’d get consistent behavior if you decide to return one of those nonsense values. I haven’t thoroughly read any of the shading language manuals/help bits I’ve found, but the behavior of “nonsense” colors may be implementation-defined (i.e. unsafe).

      It’s note #4 I’d disagree with. None of what Shamus has described so far seems like it would even benefit from conversion to an integer.
      Integers are wonderful things. They’re fast, they’re simple, and they allow you to pull out all the bit-twiddling hacks you desire. That said, few shading languages actually support those advantages; I’ve heard that some implementations even implement ints as floats in hardware, with the distinction in the language existing only to make programmers happy.

      1. Csirke says:

        Well, for #4, in this case Shamus does several iterations, so the values are stored in textures, as color values, between iterations. So wanting to get them as 0-255 values instead of 0.0-1.0 floats is reasonable I think.

        1. Richard says:

          Not really.
          Unless you actually need to do something on actual equality rather than greater-than/less-than, you should just use floats throughout the shaders.

          I think you’re still thinking like a CPU, along the lines of “Integer fast, float-point slow”.

          That simply doesn’t apply to GPUs.

          GPUs are stupendously fast at floating-point arithmetic – single-precision floats are their native format and so they are faster than integers.
          (Double-precision floats are a performance hit of course)

          Textures are usually converted to floats (along with everything else) and processed as such within shaders for this reason.

          The bonus is of course that you can get a gamut boost for free should somebody use a GPU and display system with more than 8 bits per colour.
          (Well, in the shaders anyway. Your source textures aren’t changing.)

      2. Abnaxis says:

        Shamus is testing the values for equality. Points that are lower get a -1, points that are higher get a +1, points that are equal get a zero (basically).

        You don’t do straight equality tests on floats derived from ints, unless you want funky headache-inducing bugs that are hard to reproduce.

        Of course, you can write the test to make up for it (test abs(A-B)<.001 for an effective equality test), but it's easier and clearer to just convert it into an int first, since that's really what you're working with.

        That’s why the conversion makes sense to me, at least.

  11. Akri says:

    So would this program actually let you sit and watch the landscape erode if you felt like staring at the screen long enough, or is it more of a way to create a realistic-looking landscape based on how the erosion would occur?

    1. Ilseroth says:

      That would be Door #2.

    2. Shamus says:

      You can basically watch it as it happens, although at the current speed it’s very much like watching ice melt.

      1. Grescheks says:

        Is it just me, or does that sound like something that would be kinda cool to watch if it was sped up a bit? I’m feeling another Shamus Screensaver (TM) (or possibly just a cool looking program if it’s too GPU intensive) in the making for when this project is over (or if Shamus gets stuck and needs a bit of a break).

        1. Jakale says:

          I imagine a faster version, if he puts in uplift and other terrain changing systems in, to be similar to the Time Machine description of watching the land changing under and around you.

      2. Paul Spooner says:

        I’m assuming you can just turn up the rate to make it more visible? Does this cause other problematic side-effects?

  12. Lachlan the Mad says:

    Actually, as a geology Honours student, I have to say that this is pretty close to how our erosion computer simulations work, except that instead of choosing random points the program does an “all points at once” kind of thing mathematically.

    One of my fellow students is working on a new way of measuring erosion though. He says that if it works we won’t need the mathematical methods any more, which to be honest is fine by me, because the programs that run them are a pain in the ass to use. All the best user interface technology from 1990, and they chug like a steam train even on a computer with decent RAM.

    1. The Right Trousers says:

      How would you model and quantify erosion without using mathematics?

      1. ET says:

        I too, am curious. Did they mean that they’d be using a different set of equations from some other science, which run faster on computers?

    2. Paul Spooner says:

      “this is pretty close to how our erosion computer simulations work, except that instead of choosing random points the program does an “all points at once” kind of thing mathematically.”
      I’m pretty sure Shamus’ implementation is doing all points at once as well. It’s probably not quite as computationally robust, but who knows.

      “One of my fellow students is working on a new way of measuring erosion though. He says that if it works we won't need the mathematical methods any more”
      Wait, first you talked about computer simulations, then you’re talking about measuring erosion. Measuring it in the computer simulation? And then you won’t need mathematical methods… for computer simulations? Does the computer use a numerical method instead of an arithmetic one? Do you stack computers in the shape of the terrain you’re trying to model and the pour water on them and measure which ones short out? I’m so confused.

      1. Lachlan the Mad says:

        Um, the reason I’m not being very clear about it is because he isn’t here and I’m worried about intellectual property and stuff, but to explain it as clearly as I can without being specific about the tech; He is looking for a chemical which would normally be distributed throughout the soil, but which would be preferentially moved by the rain. Measure quantities pre- and post-rainfall to get a good picture of how the soil is moved by rain.

        And I’ve realised that yes, I’m probably overselling his thing. I don’t really know a lot about it, I just think that what I understand is really cool. I also realise now that it’s not particularly fair to compare a method of measuring real change to a simulation — this method couldn’t be predictive, after all. All in all I’m in over my head here. Sorry.

        1. Paul Spooner says:

          Aah, interesting! So, it is actually measurement of real life erosion.
          I think it’s not as irrelevant as you seem to imply.
          Computer simulations need to rest on existing understanding, which in turn rely on actual measurements. An advance on the empirical front could inform improvements in simulation which in turn could (as you say) produce better predictions.

          That said, isn’t the chemical just calcite? Easily absorbed from soil, deposited as limestone? How is he planning on differentiating mechanical erosion from chemical erosion, when using a chemical trace?

          1. Lachlan the Mad says:

            It’s actually a radioisotope, and I really feel like I’m pushing the IP of the project by saying that.

  13. MichaelG says:

    Can you distribute your demos as you work this time? One of the things you wanted to learn was why code didn’t run on all platforms. It would be better to learn that as you go.

    1. Volfram says:

      I second this, and not just because I want a chance to look at it.

      I mean yeah, I want to look at this thing, but that’s not the only reason.

    2. Paul Spooner says:

      Shamus makes interesting code, but he has trouble releasing it and getting it distributed… needs a publisher perhaps?

  14. Abnaxis says:

    Yah for another, possibly unnecessary optimization!

    My understanding is, that you’re shoving every single point into the GPU and processing them in parallel.

    Couldn’t you only shove every odd/even point into the GPU, and each point update the texture of their neighbors? Or does that run into parallel memory access problems?

    1. Geebs says:

      Forgive my very low level of understanding, but you basically get a performance advantage in your fragment shader (erosion simulations are a fragment shader-type problem) by sampling texels adjacent to each other, so that you can maximise cacheing in your texture look-ups, and ensuring that the same operations happen to adjacent fragments, which works well with GPU-style parallelism.

      This means that doing a section of a texture per frame is the more efficient approach, although it’s more complicated than that because if you try to read from and write to the same texture in s shader, Bad Things happen.

  15. Shamus, wouldn’t you save a lot of cycles by merging the erosion texture with the terrain texture prior?

    Load the terrain map, calculate the erosion, apply the erosion to the terrain map. This can be done at load time of the game/level.

    The downside is that you can see “realtime” corrosion if you wanted, then again if you want to do stuff like that then one might as well have a fade to black between time leaps or a “1 sec slideshow” or similar.

    The key to performance whether it is on a CPU or a GPU is to set up and prepare as much as possible (even at the cost of some memory) before you enter a loop, the loop should be as tight as possible, especially if it’s a “FPS” type of loop.

    Terrain erosion is so slow that watching ice melt is way faster. If this is a game then the player would need to “sleep” decades or centuries each “night” to see a terrain change when they wake up.
    If this is a erosion simulator then it’s not a game any more and you might need to rethink how things are done and may need to look at OpenCL to do the simulation calculations if you want to be able to do a decade worth of erosion per 1 frame rendered etc.

    Unless you plan to do a streaming open world then loading and calculating everything when loading the level/game is just fine, sure a loading bar is annoying to look at but if 10 seconds of looking at a loading bar means that you don’t get horrible framerates then that is worth it.

    Even if you do a streaming open world you still can pre-load/calculate almost everything, after all you do not plan to make a open streaming world with erosion simulation and player deformable terrain, right!?

    I’m also curious about the terrain texture/normal map. With a 24bit texture RGB is used, but could you not use a 32bit texture and use the RGB for he normal map and the alpha fore the erosion? (then again this may just be more complicated and only saves a little disk space/memory and could cost more cycles if you need to convert it to use it).

    Also, are you truly limited to a “8bit” normal map? Is a 16bit, 24 bit or even 32bit float possible?

    Oh and can you use mipmapping to shave off some cycles? No point in doing distant high details that can’t be seen when medium or low detail is enough.

    Also, are you looking at doing any http://en.wikipedia.org/wiki/Parallax_mapping ?

    1. Moridin says:

      “Also, are you truly limited to a “8bit” normal map? Is a 16bit, 24 bit or even 32bit float possible?”

      Like Richard said above, GPUs are at their best when processing single-precision floats. Double-precision calculations are possible, but it about halves the performance – and that’s if you’re using a workstation card. Since most games don’t really need a lot of double-precision calculations and since those are quite vital for many “real-world” applications, gaming cards have crippled double-precision performance – half or even a quarter of what the GPU is really capable of – so they wouldn’t accidentally compete with the workstation cards which are significantly more expensive(for example, the 290x has recommended price of $550. FirePro w9100 using the exact same GPU but with full double-precision performance and workstation drivers available is selling for $4000). I don’t know about 24 or 32 bit calculations, but I suspect they would be much more slower still.

      1. double-precision floats? I never mentioned that-
        32bit float is single precision float and 64bit float is double precision float.
        You aren’t confusing 16bit and 32bit floats?

        Also note that what Richard wrote was after me, I am unable to see into the future (yet).

        I do agree with him tough, floating point is wonderful, in the digital world floats are the analog equivalent.
        One could use 0.0 to be sealevel and 1.0 the edge of vacuum/space with -1.0 being the core of the planet.
        Single precision/32bit float has a precision of 24bits which should be enough hopefully.

        The key to floating point is to keep all calculations/loops float as well, converting to/from integer and float is wasteful.
        The initial input and the final output is only what need to be integer, this will avoid float rounding errors too. Although modern GPUs can probably render floating point pixels too? If so only the input is integer although that might be avoidable as well.

  16. scope.creep says:

    Apparently I have a fundamental misunderstanding sort of thing going on. Why do we need to worry about framerates at all? Isn’t this something that would all happen behind a loading screen or similar? Generate the game terrain based on some seed value (procedural generation FTW), plop in whatever game play objects we want, need, saved, etc., and then ready-set-play your game on the now-unchanging* terrain?

    I suppose that the question ultimately is similar to Akri’s.

    * Granted, some sort of Slartibartfast’s Happy Funtime Fjord Creator game could be amusing in its own right.

    1. Shamus says:

      Remember that my goal was just to take some processing-heavy thing that I already understood (erosion) and move it to the GPU. So, I made terrain that slowly erodes while the program runs.

  17. Alex says:

    You could implement your original CPU erosion pretty easily in a compute shader. This is a bit easier in the dx world since compute is directly integrated in the dx api, but in the OGL world you can use OpenCL to do the same thing.

    There are a few ways you could do this, but I’d probably assign one thread per “drop”. CS provides atomic operations to modify the height map and you could synchronize the threads at each simulation step without leaving the compute shader. That has to be faster than multiple passes.

    1. Richard says:

      It’d be the same, as OpenCL would also be doing multiple passes. This is simply abstracted away so you don’t have to marshal the data yourself.

      The moment you “synchronise” data on a GPU, you’ve added an extra pass because a GPU simply cannot access the result of the computation of another invocation of a shader until the entire pass completes.

      Think of the GPU as an office block full of a few hundred to thousands of workers, each in a private office with a few of those pneumatic tubes and an abacus.

      The CPU throws a load of work and data into the pneumatic tubes.

      The GPU workers open their tubes, grab the abacus and do the calculations.
      Then they put the result in the pneumatic tube and send it back.

      When everybody has finished and replied, the next lot of work and data is sent out – nobody can do anything more until everyone has finished, they’re not allowed to get ahead of each other.

      They GPU workers cannot talk to each other, and their abacus means they can’t remember anything between invocations either.

      All each worker gets is what comes down the tubes.

  18. Abnaxis says:

    I just thought of something…

    Does the “erosion texture” ever change? Every operation you do after you calculate it doesn’t modify the height-ordering of the points, does it? You just make peaks pointy-er and troughs deeper. Do you need to re-compute the texture for every iteration? Have you ever looked into how the texture (well, the raw one that isn’t averaged for the rut problem, at least) changes from iteration to iteration?

    1. Paul Spooner says:

      I think it does. It will slowly fill in high-altitude bowls, which will then “overflow” and turn into river beds. If the erosion direction texture never changed, all the holes would eventually turn into mountain peaks… which could be interesting looking, but not “correct”.
      That said, the majority of the texture likely doesn’t change. It would be interesting to generate a “water depth” visualization of some kind, as well as a change over time. Differential erosion topography? Hell yes!

      1. Actually, I think this is one weakness of the existing erosion approach: The “water” fills no volume and so can’t overflow a bowl. Dips in a bowl’s edge will still erode a bit because there’s higher stuff to either side, but I don’t see how it can produce a “lake spills over and gouges out a river” effect. Not a major problem given Shamus’ basic objective, which has little to do with painstaking realism in the erosion.

        1. Paul Spooner says:

          True. Though it wouldn’t be too difficult to model this. Just generate a second mesh colored blue to store the water depth. Bonus points for being able to see where water is piling up.
          It’s all a distraction from the point of the exercise though.

  19. Abnaxis says:

    Sorry for the abundance of posts, but I keep thinking of stuff…

    IIRC, most GPUs have optimized matrix-math baked into them, do they not? Would it better to come up with some matrix math to do what you want to do? Because I’m pretty sure you could totally do that…

  20. Ben Deutsch says:

    Yay erosion!

    Here’s an interesting twist, though: what if you wanted to have procedurally generated infinite terrain? As in Minecraft, for example: terrain is organized in chunks (and that’s what I will be calling them for the rest of this comment). Chunks you have not visited yet don’t exist yet (and have not even been thought about) until you move close enough. Then the chunk is generated with the help of random, yet deterministic, noise processes.

    The problem here is that correct erosion would depend on all the surrounding terrain, i.e. to calculate the next chunk’s erosion, you’d have to calculate the surrounding chunks, and then calculate their chunks, and so on.

    I think one would need to have the erosion somehow determined from the “untouched” heightmap in a single step. Then to generate a new chunk, one would calculate the heightmaps of all surrounding chunks, but only erode the one actually needed chunk.

    Oh wait, I just thought of a second method: partially eroded chunks. The erosion on a newly generated chunk would only be “correct” or “final” near the player (or whatever demands new chunks), and absent near the other side. Then when the player moves nearer, complete the erosion with the help of new, further-out chunks.

    Of course, you’d end up with cases where the same potential terrain (say from the same random seed) eroded differently depending on which path you walked across it. But as long as it looks good, I doubt anyone would complain much.

    1. Abnaxis says:

      This post seems relevant…

      1. Ben Deutsch says:

        Good find, but it’s not exactly the same problem. In the post you link, the problem is to have pre-determined rivers flow through chunks as they are generated. As with erosion, the solution for the current chunk depends on the solution for the surrounding chunks, if you’re using the “find a path around the existing hills” approach.

        Which, towards the middle of the post, Shamus admits won’t work.

        The solution in the second half of the post is to make the terrain changes from the river independent of the surrounding chunks. My guess is that if you did that with erosion as written, it would generate seams along the chunks. Hence the need for an adapted algorithm.

        Of course, now I have to reread “Frontier: The Original Series” to see if Shamus actually did solve this last time around :-P

  21. Majromax says:

    There’s one problem with this method of erosion: it’s not resolution-independent.

    Consider some sort of bumpy height-map and erosion on it that produces a plausible pattern. Water runs off of the hills and digs out little valleys, etc.

    Now, double the resolution of that map via interpolation of your choice — water *still* creates valleys, but the crevies are still just one-textel wide. The same “amount” of water (adjusted for resolution) still flows through the channel, meaning the effective water-flow and erosion is double what we’d see on the lower-resolution map.

    This happens because the water in the system (2D quantity) flows through lines (1D quantity). Getting around that is why the entire mess with the blue channel of the texture was needed.

    An alternate idea may be to use a multiresolution approach. Erode first on a downsampled, 32×32 (or somesuch) version of the heightmap, apply those results to the 64×64 version, erode a few more steps, and so on. On each resolution level, we’re only asking for local erosion, but then the physical-space definition of “local” changes.

    That also coinicdentally addresses the problem of “water takes 1024 steps to go from the middle to the edge”, because we’re not asking it to do that – just local changes that can be settled over a few iterations.

    1. Abnaxis says:

      Alright, last dummy code suggestion, I promise…

      I’m personally not sure if the problem is on of resolution, but rather one of discretization. In other words, you’re taking something continuous (the direction of slope on a hill) and pigeonholing it into up/down/left/right.

      To me, a better better solution is to, well, not do that. Instead of using comparisons, use your normal map. For each point, grab the z and x components (in Occulus space) of the normal. That 2D vector should point in the direction of water flow, with a magnitude proportional to how steep the slope is. You can then add and subtract these components from the neighbors, and wind up with your erosion matrix.

      Of course, all that assumes you can add from multiple normals to the same pixel in the erosion texture in parallel, but I’m kinda working under the assumption that the capability for such operations exists…

      1. Abnaxis says:

        Omigosh, I am a dummy. I don’t know why, but I just automatically don’t think in terms of parallelization.

        On each kernel, calculate for point (X,Z): n_x(X-1,Z)*Green(X-1,Z) + n_z(X,Z-1)*Green(X,Z-1) – n_x(X+1,Z)*Green(X+1,Z) – n_z(X,Z+1)*Green(X,Z+1), multiply it by -1 (because no matter what I do I always get the sign on normals backwards), and probably multiply by some other constant because you’re going to get small numbers and you’ll need more granularity.

        That will make the “water” actually flow in the direction it needs to, instead of restricting it to cardinal directions.

        1. Volfram says:

          I’m not certain whether most people have trouble thinking in parallel or not, but I do know that taking a few Verilog classes helps a lot, because logic chips ALWAYS run in parallel.

          On the topic of your algorithm… we do happen to already have a normal map which must be calculated any time the heightmap changes… There is a difficulty, though, in that due to the quantized nature of the environment, each individual water droplet is restricted to either 4 or 8 available directions. Blurring probably helps with that a lot.(Something I hadn’t thought of for my earlier posts elsewhere)

          1. Abnaxis says:

            each individual water droplet is restricted to either 4 or 8 available directions

            Mmmm…not really.

            Don’t think of it like a water droplet. Think of it as a volume of water of undefined size hitting the ground. Some of the water might flow down. There’s nothing saying those upmty-liters of water can’t just spread out in all directions. 30% might go the neighbor on the right, while 10% goes to the neighbor on the left, 20% to the right, and 40% out the bottom. That’s how this algorithm could react if you drop a volume of fluid on top of the pinnacle of a mountain/hill

            1. Volfram says:

              Alright, now you’re onto something interesting. Treating it like that, trying to calculate how much of the water would go in each direction based on the relative slopes to the 8 neighbors, opens up some much more interesting simulation options.

              I’ll have to keep this in mind once my programming schedule frees up to where I can do experiments again.

              1. Abnaxis says:

                Something I think might be an issue, but I’m not sure if it is an issue or not without testing, is that this method doesn’t strictly obey Conservation of Mass. In my example, all the neighbors will remove water from the peak, but there’s no logic in place to make sure the neighbors actually receive that water. If the normal for the peak is pointed straight up, it will just vanish into the ether.

                So yeah, something to watch out for. Not sure how to solve it yet… Maybe add in another four terms that uses the local normal to calculate how much water it took from its neighbors? Don’t have time to think right now

      2. Zukhramm says:

        I don’t see what the distinction between resolution and discretization is here.

        And your solution is still resolution dependent. Anything depending on neighbors is.

        1. Abnaxis says:

          Okay, I’m going to drop some math terms here. This is mostly to demonstrate that I’m not completely talking out of my ass, and also because this post is already going to be long…

          So, first off–from an end-result point-of-view, there’s virtually no difference between sampling at a lower frequency and taking the average, due to the Mean Value Theorem–your way might take a different amount of time, but it will have effectively the same result as Shamus’s way. It will take a lot of math to make the theorem look like what is going on here, but the short of it is that the slopes you calculate from sub-sampling should converge to the same averages Shamus is computing if you make the grid small enough.

          As for my suggestion, I probably spent more time than is healthy making graphs to demonstrate my point. To demonstrate, take this height contour for my example. On the graph, every line represents a constant elevation–lines that are close together means the elevation is changing quickly. I’m going to super-impose arrows on the graph to show where the calculations say “water” is flowing as it goes downhill.

          As the example goes on, remember that arrows that converge on the same point add up–I could put numbers into it, but I already wrote a hundred lines of code for this bad boy :p. In the same vein, please ignore edge arrow goofiness–since I don’t have continuous neighboring tiles, weird stuff happens on the edges.

          Here is the way the “water” is going to flow when you don’t use any averaging or sampling or anything. The important thing to notice is that, since the only options for the water are N, S, E, or W, the flow pretty much runs straight to the center of the plot before taking a hard turn westward. This is going to make a deep, ugly rut, because you have a lot of erosion along a thin line.

          Here is how the plot looks with the method Shamus wound up with, by averaging with neighbors. Not the water still beelines to the center where it abruptly turns west, but the averaging has smoothed the transition at the turn significantly. This is why the end result looks much better. Your sub-sampling method would have the same result.

          Here is the method I am talking about (I finally got smart and dropped edge cases here, cause I was tired of artifacts). Note that the water is always flowing west, as it should be, and that the transition is about as smooth as it’s going to get with the shape I am starting with. The vector field I’ve put forth is the gradient, or at least the gradient as approximated by the normal-map calculations.

          Using your method, if you had an infinitely-sized height-map, from which you took an infinite number of averages/sub-samples, you would essentially be constructing a Taylor Series expansion for finding the gradient. Every sample grid you add will get you closer to the gradient, but by the same token these averages/samples are expensive.

          Using the calculated normal map, on the other hand, is still an approximation (unless you have an infinitely-sized grid from which you use all the points in a Taylor’s expansion yadda yadda…), it’s a much more efficient approximation than doing averages. For one, Shamus has to calculate the normal-map anyway, so it’s re-using work that needs done. For another thing, the normal map calculation converges in fewer iterations than averaging would.

          So yeah…way too many words about something not all that important, but at least I got an excuse to play with contour plots in R :p

          1. Zukhramm says:

            You might want to note that me and Majromax are not the same person.

            1. Abnaxis says:

              Yeah, I missed that. On a second reading, it also seems like I misread your question. I thought you were asking what the practical difference is between discretization and non-discretization was, when you were actually asking what distinction I was making.

              So, to actually answer your question: what Shamus is doing (and what we are talking about) is a rudimentary form of Finite Element Analysis, which is basically “break the space into chunks and apply equations to the chunks so you get the solution in aggregate.” Pretty much every mechanical simulation ever uses a highly advanced version of this technique to simulate airflow/heat conduction/material stress/etc…

              Nobody here is talking about doing something besides “look at neighbors and figure out where the ‘water’ goes.” The original (un-averaged) erosion texture only looked at the nearest four neighbors to come up with an answer. Averaging is, in essence, getting the neighbors of the neighbors involved, so now you have 12 neighbors involved in the calculation. Sub-sampling likewise just gathers more points into the calculations, bringing more and more neighbors into the fold to get a better answer. And, like you said, my method also looks at the nearest four neighbors to come up with a solution.

              This has nothing to do with the “discretization” I am talking about. Yes, no matter what, our calculation is the result of some math done to a finite number of points. The difference between my method and sub-sampling/averaging is that my method gives continuous output (at least as continuous as you can get with a resolution of 1 byte), whereas the original algorithm has a discrete output, which only starts to become continuous after you do a whole bunch of averages.

              In the base, un-averaged method, there are 5 possibilities for where the water goes at any single point–up, down, left, right, or zero. This makes for sharp transitions as you cross the threshold from (say) “water goes north” to “water goes west,” which makes for deep ruts as I showed in my first vector-diagram.

              When you average, you essentially let the output take on more nuanced values. Instead of up, down, left, right, zero, now your flow can also go up-left-up-right, down-left, and down-right in varying degrees. By my count, you’ve increased the number of possibilities for flow from 5 to 100. This leads to smoother transitions and less rutting, as shown in the second plot. If you’re smart about weighting, you can keep including points in your average and you will eventually get something as continuous and realistic as possible within resolution constraints.

              With my method, you have a number of variations equal to (resolution of z-component in normal-map)^2, or 65,536 different possibilities if you’re storing z-values as an 8-bit number. My output isn’t “discretized” in the sense that I’m not putting any constraint on what the output can be. I’m not categorizing my output into up,down,left,right,zero, I’m taking the continuous output of the cross product (which is a continuous formula), which is only limited by the resolution of my floating-point numbers.

              You could still add neighbors to your normal calculation in a similar manner to how you add more points into your average, and that will get you better results than using the closest four points. However, as I’ve shown, you already get vastly better performance from using the normal calculation with four neighbors than sub-sampling with 12, so the normal is going to converge much, much faster, with fewer points required in the calculation even if four neighbors isn’t good enough by itself (four neighbors is probably fine, if it’s good enough for lighting).

              Also, before a mathematician comes in and pukes at me, I know I’m not 100% super-technical-proof-theory-correct here, but I’m hoping to get the general idea across…

              1. Majromax says:

                So, to actually answer your question: what Shamus is doing (and what we are talking about) is a rudimentary form of Finite Element Analysis, which is basically “break the space into chunks and apply equations to the chunks so you get the solution in aggregate.”

                Finite difference, really. Finite element is somewhat better for unstructured grids or when certain discrete equivalents of continuous properties (such as irrotational flow) need to be preserved exactly, but the math here is definitely finite difference with upwinding.

                What I proposed was a variant of a multigrid approach.

                The root problem isn’t the gridded nature of the algorithm, it’s that the algorithm is inconsistent as the resolution increases. Flow is channeled into infinitely-thin (1 textel thin) rivers, so as textels get thinner so do the erosion paths in physical terms.

                Your approach calculates a nicer gradient for descent, but it doesn’t change that entire “converge on the steepest descent” problem.

                1. Abnaxis says:

                  Finite difference, really

                  I stand corrected. Unless you count tricks with axisymmetry, I haven’t done finite analysis in 2D since college, so I forgot the distinction.

                  Flow is channeled into infinitely-thin (1 textel thin) rivers, so as textels get thinner so do the erosion paths in physical terms.

                  We’re both describing symptoms of the same fundamental problem, then suggesting different solutions. Without doing something to it, the flow can never spilt under the naive implementation without averaging. That means you wind up with 1-texel-wide rivulets that dig trenches (although technically they aren’t infinitesimally thin, they’re as wide as the texel spacing).

                  The reason why the flow never splits, is because there are only five possibilities for flow direction–up, down, left, right, or stop. There’s no “half-up, half-left” alternative. That causes problems with convergence that you describe, but it also creates sharp transitions as I describe.

                  Using a multigrid approach solves the problem by (basically) smoothing, but you still have the problem of a finite solution space for each individual texel unless you use a whole lot of grids, which is expensive. My idea is to take the normal map, which already needs to be computed relatively accurately for lighting purposes, and use that to divide the water continuously among neighbors based on which way they’re tilted.

                  I’m not sure why you say the problem is resolution-dependent. No matter what the grid size is, the naive flow will converge to the low s on the grid, then never diverge. Both of our solutions will fix the behavior regardless of grid spacing. Too-low resolution might cause issues of aliasing and other spacial inaccuracies in the solution, but that’s going to happen no matter what type of simulation you use.

  22. Bryan says:

    Could you use the stencil buffer instead of using the G channel of the texture? Each time a point is rendered it increments the stencil buffer value, and then … uh. Hmm. It’d probably have to be a render pass per direction after that, looking at one nearby point (…I’d probably bind a pair of offsets to a uniform vec2) and seeing if it’s pointing “at us”, using the drop builtin if it isn’t, and configuring the stencil buffer to add with each rendered fragment. Or something like that.

    And now I’ve turned a single-pass render into a five-pass render. Probably a lot slower as well (round trip per direction pass), depending on how various cards do the stencil buffering. Hmm. OK maybe not. :-)

  23. Toastgoblin says:

    One thing to make this a little more interesting might be to think about climate and weather. Even on a small scale, some areas will get more rainfall that others, and so erode more. You could presumably do this with (yet another) Rainfall Map that produces areas semi-based on the local geography.

    Wind erosion in a prevailing direction might also be fun.

    If you’re concerned (as a few people mentioned) that the tributary patterns won’t shift much, you could introduce a random chance that sometimes a tributary is ignored, standing in for various factors that could disrupt the flow of water. This would tend to break up and reform tributaries over time, I think.

  24. Abnaxis says:

    I’ve got an idea digging around my head, but I’m not sure if it’s anything or not.

    I know the point of this exercise is only for you to familiarize yourself with shaders, and my new idea (like most of my ideas swimming around in this thread) doesn’t have anything to do with that, but I think I’m on to something useful.

    What noise generating function are you using to generate the original landscape? It’s simplex, right? What smoothing/interpolation function do you use? How many octaves are you sampling?

  25. D-z says:

    A bit late to the party (I read this article when it came out, but hadn’t messed with shaders yet), but you can use integer textures and usampler2d to avoid converting back and forth between int and float.

Thanks for joining the discussion. Be nice, don't post angry, and enjoy yourself. This is supposed to be fun. Your email address will not be published. Required fields are marked*

You can enclose spoilers in <strike> tags like so:
<strike>Darth Vader is Luke's father!</strike>

You can make things italics like this:
Can you imagine having Darth Vader as your <i>father</i>?

You can make things bold like this:
I'm <b>very</b> glad Darth Vader isn't my father.

You can make links like this:
I'm reading about <a href="http://en.wikipedia.org/wiki/Darth_Vader">Darth Vader</a> on Wikipedia!

You can quote someone like this:
Darth Vader said <blockquote>Luke, I am your father.</blockquote>

Leave a Reply to Toastgoblin Cancel reply

Your email address will not be published.