Making Worlds 1 - Of Spheres and Cubes

February 5, 2012, 1:08 pm

Making Worlds 1 - Of Spheres and Cubes

Let's start making some planets! Now, while this started as a random idea kind of project, it was clear from the start that I'd actually need to do a lot of homework for this. Before I could get anywhere, I needed to define exactly what I was aiming for.

The first step in this was to shop around for some inspirational art and reference pictures. While there is plenty space art to be found online, in this case, nothing can substitute for the real thing. So I focused my search on real pictures, both of landscapes (terran or otherwise) as well as from space. I found classy shots like these:

Hopefully I'll be able to render something similar in a while. At the same time, I eagerly devoured any paper I could find on rendering techniques from the past decade, some intended for real-time rendering, some old enough to be real-time today. Out of all this, I quickly settled on my goals:

Represent spherical or lumpy heavenly bodies from asteroids to suns.
With realistic looking topography and features.
Viewable across all scales from surface to space.
At flight-simulator levels of detail.
Rendered with convincing atmosphere, water, clouds, haze.

For most of these points, I found one or more papers describing a useful technique I could use or adapt. At the same time, there are still plenty of unknowns I'll need to figure out along the way, not to mention significant amounts of fudging and experimentation.

The Spherical Grid

To get started I needed to build some geometry, and to do that I needed to figure out what geometry I should use. After reviewing some options, I quickly settled on a regular spherical displacement map (AKA a heightmap). That is, starting with a smooth sphere, move every surface point up or down, perpendicular to the surface, to create terrain on the surface.

If these vertical displacements are very small compared to the sphere radius, this can represent the surface of a typical planet (like Earth) at the levels of detail I'm looking for. If the displacements are of the same order as the sphere radius, you can deform it into very irregular potato-like shapes. The only thing heightmaps can't do is caves, tunnels, overhang and other kinds of holes, which is fine for now.

The big question is, how should the spherical surface be divided up and represented? With a sphere, this is not an easy question, because there is no single obvious way to divide a spherical surface into regular sections or grids. Various techniques exist, each with their own benefits and specific use cases, and I spent quite some time looking into them. Here's a comparison between four different tesselations:

Different tesselations of a sphere Source

Note that the tesselation labeled ECP is just the regular geographic latitude-longitude grid.

The main features I was looking for were speed and simplicity, so I settled on the 'quadcube'. This is where you start with a cube whose faces have been divided into regular grids, and project every surface point out from the middle to an enclosing sphere. This results in a perfectly smooth sphere, built out of 6 identical round shells with curved edges. This arrangement is better known as the 'cube map' and often used for storing arbitrary 360 degree panorama views.

Here's a cube and its spherical projection:

Mapping a cube to a sphere The projected cube edges are indicated in red. Note that the resulting sphere is perfectly smooth and round, even though the grid has a bulgy appearance.

Cube maps are great, because they are very easy to calculate and do not require complicated trigonometry. In reverse, mapping arbitrary spherical points back onto the cube is even simpler and in fact natively supported by GPUs as a texture mapping feature.

This is important, because I'll be generating the surface terrain and texture dynamically and will need to index and access each surface 'pixel' efficiently. Using a cube map, I simply identify the corresponding face, and then index it using x/y coordinates on the face's grid.

The downside of cube maps is that the distance and area between points varies along the grid, which makes it harder to perform certain operations on a surface equally. However, these area distortions are much smaller than e.g. a lat-long grid, where the grid spacing actually approaches zero near the poles. Even more, the distortions made by a cube map are the exact opposite of those you get with a regular perspective projection. This makes it easy to render into cube maps, which will be useful for texture generation.

Level of Detail

There's another reason I picked the cube map approach, and that has to do with the level of detail requirements. My goal is to make a planet that can be viewed from the ground, the air as well as from space. It would be incredibly slow to always render everything at maximum detail, so I need to adaptively add and remove detail as the viewer gets closer to the surface.

However, increasing the level of detail uniformly across the entire sphere is not enough, because I only want to render detail where the viewer will see it. To a viewer on the ground, most of the planet is hidden by the horizon, and the engine should be able to effectively cut away the unseen pieces, so no wasteful processing takes place.

It is here that I get a huge benefit from the cube map layout of the sphere, because it lets me apply the well-researched realm of grid-based flat terrain rendering with only minor adjustments. Specifically, I am using a 'chunked LOD' approach. Every face of the cube map becomes a quadtree, with each level splitting four ways to form the next level with more detail:

Quadtree terrain Source

The chunks for the various levels of detail are all loaded into GPU memory, ready to be accessed at any time. When the terrain has to be rendered, the engine walks down the quad-tree, determines the appropriate level-of-detail for each section, and outputs the list of chunks to be rendered for a particular frame. Then, the GPU does its work, blasting through each chunk at a blistering pace, leaving the CPU to do other things.

Configuration of chunks to render

Because all the data is already in memory, changing the level of detail just means rendering a different set of chunks. Each chunk has the same geometrical complexity, and performance is directly proportional to how many are rendered on screen. More detail means more chunks, but that usually also means you can cut away pieces of the terrain that are far away.

The chunked approach is also very easy to work with, because there is no data dependency between the different chunks. Each chunk has a copy of its own vertex data, which means individual chunks can be paged in and out of GPU memory at will. This is important for keeping memory usage down while still being able to scale to massive sizes.

Putting It All Together

At this point, I have all the pieces in place to render an adaptive sphere mesh. This is what it looks like (sorry, the video capture is a bit jerky):

The detail increases as the camera gets closer to the sphere and shifts around the surface as it moves.

Far from being a little coding experiment, it actually took me quite some time to get to this point, because I was learning OGRE, sharpening my C++ skills, as well as researching the techniques to use.

The next step is to look at generating heightmaps and textures for the surface.

References

The techniques I used were pioneered by people smarter and older than me, I'm just building my own little digital machine with them.

Creating Spherical Worlds, Maxis/Electronic Arts. (source).
Rendering Massive Terrains using Chunked Level of Detail Control, Thatcher Ulrich. (source)

↧

My JS1K Demo - The Making Of

February 5, 2012, 1:08 pm

≫ Next: Wiki TL;DR – WikiLeaks Reader

≪ Previous: Making Worlds 1 - Of Spheres and Cubes

My JS1K Demo - The Making Of

If you haven't seen it yet, check out the JS1K demo contest. The goal is to do something neat in 1 kilobyte of JavaScript code.

I couldn't resist making one myself, so I pulled out my bag of tricks from my Winamp music visualization days and started coding. I'm really happy with how it turned out. And no, it won't work in Internet Explorer 8 or less.

Edit: OH SNAP! I just rewrote the demo to include volumetric light beams and still fit in 1K:

Original Version

Improved Version

Now, whenever size is an issue, the best way to make a small program is to generate all data on the fly, i.e. procedurally. This saves valuable storage space. While this might seem like a black art, often it just comes down to clever use of (high school) math. And as is often the case, the best tricks are also the simplest, as they use the least amount of code.

To illustrate this, I'm going to break down my demo and show you all the major pieces and shortcuts used. Unlike the actual 1K demo, the code snippets here will feature legible spacing and descriptive variable names.

Initialization

JS1K's rules give you a Canvas tag to work with, so the first piece of code initializes it and makes it fill the window.

From then on, it just renders frames of the demo. There are four major parts to this:

Animating the wires
Rotating and projecting the wires into the camera view
Coloring the wires
Animating the camera

All of this is done 30 times per second, using a normal setInterval timer:

setInterval(function () { ... }, 33);

Drawing Wires

The most obvious trick is that everything in the demo is drawn using only a single primitive: a line segment of varying color and stroke width. This allows the whole drawing process to be streamlined into two tight, nested loops. Each inner iteration draws a new line segment from where the previous one ended, while the outer iteration loops over the different wires.

The lines are blended additively, using the built-in 'lighten' mode, which means they can be drawn in any order. This avoids having to manually sort them back-to-front.

To simplify the perspective transformations, I use a coordinate system that places the point (0, 0) in the center of the canvas and ranges from -1 to 1 in both coordinates. This is a compact and convenient way of dealing with varying window sizes, without using up a lot of code:

with (graphics) { ratio = width / height; globalCompositeOperation = 'lighter'; scale(width / 2 / ratio, height / 2); translate(ratio, 1); lineWidthFactor = 45 / height; ...

I also add a correction ratio for non-square windows and calculate a reference line width lineWidthFactor for later.

Then there's the two nested for loops: one iterating over the wires, and one iterating over the individual points along each wire. In pseudo-code they look like:

For (12 wires => wireIndex) { Begin new wire For (45 points along each wire => pointIndex) { Calculate path of point on a sphere: (x,y,z) Extrude outwards in swooshes: (x,y,z) Translate and Rotate into camera view: (x,y,z) Project to 2D: (x,y) Calculate color, width and luminance of this line: (r,g,b) (w,l) If (this point is in front of the camera) { If (the last point was visible) { Draw line segment from last point to (x,y) } } else { Mark this point as invisible } Mark beginning of new line segment at (x,y) } }

Mathbending

To generate the wires, I start with a formula which generates a sinuous path on a sphere, using latitude/longitude. This controls the tip of each wire and looks like:

offset = time - pointIndex * 0.03 - wireIndex * 3; longitude = cos(offset + sin(offset * 0.31)) * 2 + sin(offset * 0.83) * 3 + offset * 0.02; latitude = sin(offset * 0.7) - cos(3 + offset * 0.23) * 3;

This is classic procedural coding at its best: take a time-based offset and plug it into a random mish-mash of easily available functions like cosine and sine. Tweak it until it 'does the right thing'. It's a very cheap way of creating interesting, organic looking patterns.

This is more art than science, and mostly just takes practice. Any time spent with a graphical calculator will definitely pay off, as you will know better which mathematical ingredients result in which shapes or patterns. Also, there are a couple of things you can do to maximize the appeal of these formulas.

First, always include some non-linear combinations of operators, e.g. nesting the sin() inside the cos() call. Combined, they are more interesting than when one is merely overlaid on the other. In this case, it turns regular oscillations into time-varying frequencies.

Second, always scale different wave periods using prime numbers. Because primes have no factors in common, this ensures that it takes a very long time before there is a perfect repetition of all the individual periods. Mathematically, the least common multiple of the chosen periods is huge (414253 units ~ 4.8 hours). Plotting the longitude/latitude for offset = 0..600 you get:

a pseudo-random set of oscillations

The graph looks like a random tangled curve, with no apparent structure, which makes for motions that never seem to repeat. If however, you reduce each constant to only a single significant digit (e.g. 0.31 -> 0.3, 0.83 -> 0.8), then suddenly repetition becomes apparent:

a not so pseudo-random set of oscillations

This is because the least common multiple has dropped to 84 units ~ 3.5 seconds. Note that both formulas have the same code complexity, but radically different results. This is why all procedural coding involves some degree of creative fine tuning.

Extrusion

Given the formula for the tip of each wire, I can generate the rest of the wire by sweeping its tail behind it, delayed in time. This is why pointIndex appears as a negative in the formula for offset above. At the same time, I move the points outwards to create long tails.

I also need to convert from lat/long to regular 3D XYZ, which is done using the spherical coordinate transform:

spherical coordinates

Source: Wikipedia

distance = f.sqrt(pointIndex+.2); x = cos(longitude) * cos(latitude) * distance; y = sin(longitude) * cos(latitude) * distance; z = sin(latitude) * distance;

You might notice that rather than making distance a straight up function of the length pointIndex along the wire, I applied a square root. This is another one of those procedural tricks that seems arbitrary, but actually serves an important visual purpose. This is what the square root looks like (solid curve):

square root

The dotted curve is the square root's derivative, i.e. it indicates the slope of the solid curve. Because the slope goes down with increasing distance, this trick has the effect of slowing down the outward motion of the wires the further they get. In practice, this means the wires are more tense in the middle, and more slack on the outside. It adds just enough faux-physics to make the effect visually appealing.

Rotation and Projection

Once I have absolute 3D coordinates for a point on a wire, I have to render it from the camera's point of view. This is done by moving the origin to the camera's position (X,Y,Z), and applying two rotations: one around the vertical (yaw) and one around the horizontal (pitch). It's like spinning on a wheely chair, while tilting your head up/down.

x -= X; y -= Y; z -= Z; x2 = x * cos(yaw) + z * sin(yaw); y2 = y; z2 = z * cos(yaw) - x * sin(yaw); x3 = x2; y3 = y2 * cos(pitch) + z2 * sin(pitch); z3 = z2 * cos(pitch) - y2 * sin(pitch);

The camera-relative coordinates are projected in perspective by dividing by Z — the further away an object, the smaller it is. Lines with negative Z are behind the camera and shouldn't be drawn. The width of the line is also scaled proportional to distance, and the first line segment of each wire is drawn thicker, so it looks like a plug of some kind:

plug = !pointIndex; lineWidth = lineWidthFactor * (2 + plug) / z3; x = x3 / z3; y = y3 / z3; lineTo(x, y); if (z3 > 0.1) { if (lastPointVisible) { stroke(); } else { lastPointVisible = true; } } else { lastPointVisible = false; } beginPath(); moveTo(x, y);

Coloring

Each line segment also needs an appropriate coloring. Again, I used some trial and error to find a simple formula that works well. It uses a sine wave to rotate overall luminance in and out of the (Red, Green, Blue) channels in a deliberately skewed fashion, and shifts the R component slowly over time. This results in a nice varied palette that isn't overly saturated.

pulse = max(0, sin(time * 6 - pointIndex / 8) - 0.95) * 70; luminance = round(45 - pointIndex) * (1 + plug + pulse); strokeStyle='rgb(' + round(luminance * (sin(plug + wireIndex + time * 0.15) + 1)) + ',' + round(luminance * (plug + sin(wireIndex - 1) + 1)) + ',' + round(luminance * (plug + sin(wireIndex - 1.3) + 1)) + ')';

Here, pulse causes bright pulses to run across the wires. I start with a regular sine wave over the length of the wire, but truncate off everything but the last 5% of each crest to turn it into a sparse pulse train:

sine pulse train

Camera Motion

With the main visual in place, almost all my code budget is gone, leaving very little room for the camera. I need a simple way to create consistent motion of the camera's X, Y and Z coordinates. So, I use a neat low-tech trick: repeated interpolation. It looks like this:

sample += (target - sample) * fraction

target is set to a random value. Then, every frame, sample is moved a certain fraction towards it (e.g. 0.1). This turns sample into a smoothed version of target. Technically, this is a one-pole low-pass filter.

This works even better when you apply it twice in a row, with an intermediate value being interpolated as well:

intermediate += (target - intermediate) * fraction sample += (intermediate - sample) * fraction

A sample run with target being changed at random might look like this:

sine pulse train

You can see that with each interpolation pass, more discontinuities get filtered out. First, jumps are turned into kinks. Then, those are smoothed out into nice bumps.

In my demo, this principle is applied separately to the camera's X, Y and Z positions. Every ~2.5 seconds a new target position is chosen:

if (frames++ > 70) { Xt = random() * 18 - 9; Yt = random() * 18 - 9; Zt = random() * 18 - 9; frames = 0; } function interpolate(a,b) { return a + (b-a) * 0.04; } Xi = interpolate(Xi, Xt); Yi = interpolate(Yi, Yt); Zi = interpolate(Zi, Zt); X = interpolate(X, Xi); Y = interpolate(Y, Yi); Z = interpolate(Z, Zi);

The resulting path is completely smooth and feels quite dynamic.

Camera Rotation

The final piece is orienting the camera properly. The simplest solution would be to point the camera straight at the center of the object, by calculating the appropriate pitch and yaw directly off the camera's position (X,Y,Z):

yaw = atan2(Z, -X); pitch = atan2(Y, sqrt(X * X + Z * Z));

However, this gives the demo a very static, artificial appearance. What's better is making the camera point in the right direction, but with just enough freedom to pan around a bit.

Unfortunately, the 1K limit is unforgiving, and I don't have any space to waste on more 'magic' formulas or interpolations. So instead, I cheat by replacing the formulas above with:

yaw = atan2(Z, -X * 2); pitch = atan2(Y * 2, sqrt(X * X + Z * Z));

By multiplying X and Y by 2 strategically, the formula is 'wrong', but the error is limited to about 45 degrees and varies smoothly. Essentially, I gave the camera a lazy eye, and got the perfect dynamic motion with only 4 bytes extra!

Addendum

After seeing the other demos in the contest, I wasn't so sure about my entry, so I started working on a version 2. The main difference is the addition of glowy light beams around the object.

As you might suspect, I'm cheating massively here: rather than do physically correct light scattering calculations, I'm just using a 2D effect. Thankfully it comes out looking great.

Essentially, I take the rendered image, and process it in a second Canvas that is hidden. This new image is then layered on the original.

I take the image and repeatedly blend it with a zoomed out copy of itself. With every pass the number of copies doubles, and the zoom factor is squared every time. After 3 passes, the image has been smeared out into an 8 = 2³ 'tap' radial blur. I lock the zooming to the center of the 3D object. This makes the beams look like they're part of the 3D world rather than drawn on later.

For additional speed, the beam image is processed at half the resolution. As a side effect, the scaling down acts like a slight blur filter for the beams.

Unfortunately, this effect was not very compact, as it required a lot of drawing mode changes and context switches. I had no room for it in the source code.

So, I had to squeeze out some more room in the original. First, I simplified the various formulas to the bare minimum required for interesting visuals. I replaced the camera code with a much simpler one, and started aggressively shaving off every single byte I could find. Then I got creative, and ended up recreating the secondary canvas every frame just to avoid switching back its state to the default.

Eventually, after a lot of bit twiddling, a version came out that was 1024 bytes long. I had to do a lot of unholy things to get it to fit, but I think the end result is worth it ;).

Closing Thoughts

I've long been a fan of the demo scene, and fondly remember Second Reality in 1993 as my introduction to the genre. Since then, I've always looked at math as a tool to be mastered and wielded rather than subject matter to be absorbed.

With this blog post, I hope to inspire you to take the plunge and see where some simple formulas can take you.

↧

Wiki TL;DR – WikiLeaks Reader

February 5, 2012, 1:08 pm

≫ Next: Making Worlds 3 - That's no Moon...

≪ Previous: My JS1K Demo - The Making Of

Wiki TL;DR – WikiLeaks Reader

Wiki TL;DR is an extension for Safari and Chrome. It replaces the drab data dumps of WikiLeaks' Cablegate with richly formatted pages optimized for reading.

Wiki TL;DR for Chrome (0.3)
Wiki TL;DR for Safari (0.3)

Abbreviations are expanded, text is reflowed, a map is added and the entire page is laid out with a clean design. Security clearances and message priorities are indicated. A summary is included on top.

It's also got a variety of formatting rules. So far the majority of cables work perfectly, but feedback is welcome. Source code is available on github.

Note: this extension only works if you access wikileaks using the wikileaks.ch domain, instead of wikileaks.org. I'll update it soon.

↧

Making Worlds 3 - That's no Moon...

February 5, 2012, 1:08 pm

≫ Next: Kindle Faux PDF Zoom

≪ Previous: Wiki TL;DR – WikiLeaks Reader

Making Worlds 3 - That's no Moon...

It's been over two months since the last installment in this series. Oops. Unfortunately, while trying to get to the next stage of this project, I ran into some walls. My main problem is that I'm not just creating worlds, but also learning to work with the Ogre engine and modern graphics hardware in particular.

This presents some interesting challenges: between my own code and the pixels on the screen, there are no less than three levels of indirection. First, there's Ogre, a complex piece of C++ code that provides me with high-level graphics tools (i.e. objects in space). Ogre talks to OpenGL, which abstracts away low-level graphics operations (i.e. commands necessary to draw a single frame). The OpenGL calls are handed off to the graphics driver, which translates them into operations on the actual hardware (processing vertices and pixels in GPU memory). Given this long dependency chain, it's no surprise that when something goes wrong, it can be hard to pinpoint exactly where the problem lies. In my case, an oversight and misunderstanding of an Ogre feature lead to several days of wasted time and a lot of frustration that made me put aside the project for a while.

With that said, back to the planets...

Normal mapping

Last time, I ended with a bumpy surface, carved by applying brushes to the surface. The geometry was there, but the surface was still just solid white. To make it more visually interesting, I'm going to apply light shading.

The most basic information you need for shading a surface is the surface normal. This is the vector perpendicular to the surface at a particular point. For flat surfaces, the normal is the same everywhere. For curved surfaces, the normal varies continuously across the surface. Typical materials reflect the most light when the surface normal points straight at the light source. By comparing the surface normal with the direction of incoming light (using the vector dot product), you can get a good measure of how bright the surface should be under illumination:

Schematic representation of surface shading with normals Lighting a surface using its normals.

To use normals for lighting, I have two options. The first is to do this on a geometry basis, assigning a normal to every triangle in the planet mesh. This is straightforward, but ties the quality of the shading to the level of detail in the geometry. A second, better way is to use a normal map. You stretch an image over the surface, as you would for applying textures, but instead of color, each pixel in the image represents a normal vector in 3D. Each pixel's channels (red, green, blue) are used to describe the vector's X, Y and Z values. When lighting the surface, the normal for a particular point is found by looking it up in the normal map.

The benefit of this approach is that you can stretch a high resolution normal map over low resolution geometry, often with almost no visual difference.

Schematic representation of surface shading with normals Lighting a low-resolution surface using high-resolution normals.

Here's the technique applied to a real model:

Normal mapping in practice (Source - Creative Commons Share-alike Attribution)

Normal mapping helps keep performance up and memory usage down.

Finding Normals

So how do you generate such a normal map, or even a single normal at a single point? There are many ways, but the basic principle is usually the same. First you calculate two different vectors which are tangent to the surface at the point in question. Then you use the cross product to find a vector perpendicular to the two. This third vector is unique and will be the surface normal.

For triangles, you can pick any two triangle edges as vectors. In my case, the surface is described by a heightmap on a sphere, which makes things a bit trickier and requires some math.

I asked my friend Djun Kim, Ph.D. and teacher of mathematics at UBC for help and he recommended Calculus on Manifolds by Michael Spivak. This deceptively small and thin book covers all the basics of calculus in a dense and compact way, and quickly became my new favorite reading material.

Differential Geometry

In this section, I'll describe the formulas needed to calculate the normals of a spherical heightmap. Unlike what I've written before, this will dive shamelessly into specifics and not eschew math. The reason I'm writing it down is because I couldn't find a complete reference online. If math scares you, this section might not be for you. Scroll down until you reach the crater, or take a detour by reading A Mathematician's Lament by Paul Lockhart, which will make you feel better about it.

First, we're going to derive normals for a regular flat terrain heightmap. To start, we need to define the terrain surface. Starting with a 2D heightmap, i.e. a function f(u,v) of two coordinates that returns a height value, we can create a 3 dimensional surface g:

Mathematical formulation of heightmapping

We can use this formal description to find tangent and normal vectors. A vector is tangent when its direction matches the slope of the surface in a particular direction. Differential math tells us that slope is found by taking the derivative. For functions of multiple variables, that means we can find tangent vectors along curves of constant v or constant u. These curves are the thin grid lines in the diagram. To do this, we take partial derivatives with respect to u (with v constant) and with respect to v (with u constant). The set of all partial derivatives is called the Jacobian matrix J, whose rows form the tangent vectors t_u and t_v, indicated in red and purple:

Heightmapping, jacobian, finding normals

The cross product of t_u and t_v gives us n, the surface normal.

When applied to a discrete heightmap, the function f(u,v) is a 2D array map[u][v], and the partial derivatives at the end have to be replaced with something else. We can use finite differences to approximate the slope of the surface by differencing neighbouring samples:

Heightmapping, finite differences

This result and the formula for n are usually provided as-is in terrain mapping guides, without going through the full process of finding tangents first. However, it's important to use the Jacobian matrix formulation once you switch to spherical terrain.

Mapping a cube to a sphere

To make a sphere, we add an additional function k which warps the flat terrain into a spherical shell. Each shell is the result of warping a single face of the cubemap and covers exactly 1/6th of the sphere. In what follows, We'll only consider a single face and its shell.

We designate the intermediate pre-warp coordinates (s,t,h), and the final post-warp coordinates as (x,y,z):

Mathematical formulation of spherical heightmapping

The principle behind the spherical mapping is this: first we take the vector (s,t,1), which lies in the base plane of the flat terrain. We normalize this vector by dividing it by its length w, which has the effect of projecting it onto the sphere. Then we multiply the resulting vector by the terrain height h to create the terrain on the sphere's surface.

Just like with the function g(u,v) and J(u,v), we can find the Jacobian matrix J(s,t,h) of k(s,t,h). Because there are 3 input values for the function k, there are 3 tangents, along curves of varying s (with constant t and h), varying t (constant s and h) and varying h (constant s and t). The three tangents are named t_s, t_t, t_h.

Spherical heightmapping, jacobian. PS: If your skills at derivation are a bit rusty, remember that Wolfram Alpha can do it for you.

How does this help? The three vectors describe a local frame of reference at each point in space. Near the edges of the grid, they get more skewed and angular. We use these vectors to transform the flat frame of reference into the right shape, so we can construct a new 90 degree angle here.

In mathematical terms, we multiply the 'flat' partial derivatives by the Jacobian matrix. This is similar to the chain rule for regular derivatives, only for multiple variables.

That is, to find the partial derivatives (i.e. tangent vectors) of the final spherical terrain with respect to the original terrain coordinates u and v, we can take the flat terrain's tangents t_u and t_v and multiply them by J(s,t,h). Once we have the two post-warp tangents, we take their cross product, and find the normal of the spherical terrain:

Spherical heightmapping, jacobian.

It's imporant to note that this is not the same as simply multiplying the flat terrain normal with J(s,t,h). J(s,t,h)'s rows do not form a set of perpendicular vectors (it is not an orthogonal matrix), which means it does not preserve angles between vectors when you multiply by it. In other words, J(s,t,h) * n, with n the flat terrain normal, would not be perpendicular to the spherical terrain. This is why it's important to return to the basic calculus underneath, so we can get the correct, complete formula.

Thus ends the magical math adventure. If you read it all the way through, cheers!

No Wait, It is a Moon.

With the normal map in place, I can now render the planet's surface and get a realistic idea of what it looks like. To show this off, I tweaked the brush system a bit: instead of using the literal brush image (e.g. a smooth, round crater), the brush is distorted with fractal noise. It makes every application of the brush subtly different from the next, and saves me from manually drawing e.g. a hundred different craters.

Brush distortion.
Here's a side by side comparison of the original brush and a distorted version.

Currently I've only implemented one type of distortion, which lends a rocky appearance to the surface. With that in place, my engine can now generate somewhat realistic looking moon surfaces. Here's the demo:

References

The techniques I used were pioneered by people smarter and older than me, I'm just building my own little digital machine with them.

Creating Spherical Worlds, Maxis/Electronic Arts. (source).
Calculus on Manifolds, Michael Spivak.

↧

Kindle Faux PDF Zoom

February 5, 2012, 1:08 pm

≫ Next: Using Web APIs for Research

≪ Previous: Making Worlds 3 - That's no Moon...

Kindle Faux PDF Zoom

Through the miracle of xmas, I acquired a Kindle. A sleek e-reader, but also a shameless vehicle for Amazon's digital book store. With the latest firmware installed, they do make for great PDF readers... in theory.

Kindle PDF fail

The good news is that the e-ink display on the Kindle is indeed pretty sweet. It works so well that the screen looks positively fake when it's not changing, as if it was just a display item in a shop somewhere. But the bad news is that the software needs a lot of love.

The included PDF reader for example has no zoom option. All you can do is toggle between portrait and landscape. Either way, normal sized text ends up tiny and barely readable.

Thankfully, we can still do it ourselves. Armed with PyPDF I wrote a simple script that takes a regular A4/Letter PDF and chops each page into four parts. You can pan through the document just by hitting next. Most of the stuff I read these days is academic, in the classic two column paper format, so this orders the sub-pages to match that.

The script is available for download. It requires Python and pyPdf. Usage:

python rekindle.py file.pdf

It will produce file.kindle.pdf. The code doesn't actually look at the contents, it just cuts blindy, so it might need adjustment for certain docs.

Kindle Zoom

↧

Using Web APIs for Research

February 5, 2012, 1:08 pm

≫ Next: The Reality of Illegal TV Downloads

≪ Previous: Kindle Faux PDF Zoom

Using Web APIs for Research

Recently we launched our new product at Strutta, a 'create your own contest site' web service. In each contest, users submit and vote on each other's videos, pictures, songs or writings.

As part of the research we did for the development, we wanted to examine our competition. So, I dove into YouTube to try and figure out some of their ideas and algorithms. For me, this wasn't entirely new: when I posted my Line Rider videos to YouTube, I followed up each video with manual statistics tracking and gained some insight into how a video becomes popular on YouTube. However, that only gave me a very narrow view of the community and its dynamics.

Since then though, things have changed a lot. YouTube now has a public API as well as pre-made libraries to use. With these, it becomes very easy to collect statistics and perform your own analysis. So, armed with Python, I set out to investigate YouTube's ubiquitous 'related videos' feature.

I found it interesting to analyse a big site through their own API rather than screen scraping. Traditionally, one first tries to collect as much data as possible, but the resulting data set can become very unwieldy. In this case, I already had full access and I could focus on exactly which queries I wanted to run, how to aggregate my data, and which measures to focus on.

The results revealed some interesting conclusions. My big write-up can be found on the Strutta Blog, aptly titled Six Degrees of YouTube.

↧

The Reality of Illegal TV Downloads

February 5, 2012, 1:08 pm

≫ Next: Making Worlds 2 - Scaling Heights

≪ Previous: Using Web APIs for Research

The Reality of Illegal TV Downloads

As you may know, I'm a sci-fi nerd, hence I've been pretty excited about the reimagined Battlestar Galactica series coming to a close. So, me and my fellow connoisseur of the awesome, Greg, put together a quick survey on Google Docs to get predictions about the end of the show. The internets filled it in.

The Battlestar nerdery was all in good fun, but more interestingly, I also asked a question about how people watch the show: via live broadcast, recorded or downloaded? Legally or illegally? Depending on your point of view, these results are either entirely obvious, or quite surprising. So far, 313 people filled in the survey, which was advertised only through blogs and Twitter for two days:

How techie people watch TV

Given the circumstances, the people who answered this fit two descriptions. One, they are fan enough to actually fill out a survey about a show on TV. Two, they read blogs, talk on Twitter, hang out in forums, i.e. they know and use the web intimately. So, ~~SciFi channel~~ SyFy, NBC Universal, all big name media: do you see that big green chunk of people who download your shows illegally? These are merely potential customers that you haven't reached yet.

When you look at your ratings and bemoan the dwindling numbers, think of that 30%. Sure, these people are probably not getting you any ad money, but you can profit off them indirectly. Tech savvy people are the backbone of your nerdy fandom, and they add value to your precious intellectual property. Who do you think helped all those girlfriends, husbands, parents or siblings get over the silly name 'Battlestar Galactica' and actually watch the show? Who wrote all that stuff on the Battlestar wiki in their spare time, providing an anchor for online discussions and activity, keeping your brand active? Yup that's right, the nerds with their computers.

And seriously, those 30% aren't all anarchistic hacker types who despise copyright. A lot of them are just people who want to enjoy the show they love in the way that is convenient for them. An illegal high-definition torrent released a couple hours after the TV broadcast is indeed pretty darn convenient. Lucky for you, you are in the unique position of offering something even better.

Just stop treating the live broadcast as being sacred: it is merely one showing after the content has been made available. Instead, provide your own episode downloads at the same time as the TV broadcast. Make it attractive with additional extras for your hungry audience, like director commentary or deleted scenes. By all means offer a free ad-supported plan like Hulu for those who don't mind having their shows and brains invaded by rabid commercialism.

But please, open up the modestly priced option of high quality, ad-free, DRM-free downloads. The technology is there. If you do it right, you will go from making no money off of these people, to making some money off of them. Trust me, this group is only getting bigger by the day. Of course, it won't be easy: your inaction has caused an entire ecosystem of illegal distribution to spring up across IRC, Usenet, private trackers and the web. These people are organised and very good at what they do. Your competition is tough.

In this light, it's a bit silly to try and push region-restricted, delayed and limited online releases onto people. You're only providing a product worse than what's already available. You're clinging to your old ways, and only succeeding because a lot of this stuff is darn new and only the kids are doing it anyway.

Except, even the grown up folks around me are starting to figure it out. Some run Boxee on their cracked Apple TV (a plug and play process). Some have a dedicated torrent box at home that they log into (screen sharing built into their Mac). They've got phones in their pockets that can network literally anywhere, and look, someone can sell them an app to torrent their shows for a buck or two. See how this whole digital economy stuff works?

Do you honestly think you can control all that with increasingly restrictive DRM backed by increasingly restrictive law? Time's arrow points to our point of view becoming the dominant one. Either reinvent yourself, or continue losing. It's your move, TV.

↧

Making Worlds 2 - Scaling Heights

February 5, 2012, 1:08 pm

≫ Next: On TermKit

≪ Previous: The Reality of Illegal TV Downloads

Making Worlds 2 - Scaling Heights

Last time, I had a working, smooth sphere mesh. The next step is to create terrain.

Scale

Though my goal is to render at a huge range of scales, I'm going to focus on views from space first. That strongly limits how much detail I need to store and render. Aside from being a good initial sandbox in terms of content generation, it also means I can comfortably keep using my current code, which doesn't do any sophisticated memory or resource management yet. I'd much rather work on getting something interesting up first rather than work on invisible infrastructure.

That said, this is not necessarily a limitation. The interesting thing about procedural content is that every generator you build can be combined with many others, including a copy of itself. In the case of terrain, there are definite fractal properties, like self-similarity at different levels of scale. This means that once I've generated the lowest resolution terrain, I can generate smaller scale variations and combine them with the larger levels for more detail. This can be repeated indefinitely and is only limited by the amount of memory available.

Perlin Noise is a celebrated classic procedural algorithm,
often used as a fractal generator.

Height

To build terrain, I need to create heightmaps for all 6 cube faces. Shamelessly stealing more ideas from Spore, I'm doing this on the GPU instead of the CPU, for speed. The GPU normally processes colored pixels, but there's no reason why you can't bind a heightmap's contents as a grayscale (one channel) image and 'draw' into it. As long I build my terrain using simple, repeated drawing operations, this will run incredibly fast.

In this case, I'm stamping various brushes onto the sphere's surface to create bumps and pits. Each brush is a regular PNG image which is projected onto the surface around a particular point. The luminance of the brush's pixels determines whether to raise or lower terrain and by how much.

Three example brushes from Spore. (source)

However, while the brushes need to appear seamless on the final sphere, the drawing area consists only of the straight, square set of cube map faces. It might seem tricky to make this work so that the terrain appears undistorted on the curved sphere grid, but in fact, this distortion is neatly compensated for by good old perspective. All I need to do is set up a virtual scene in 3D, where the brushes are actual shapes hovering around the origin and facing the center. Then, I place a camera in the middle and take a snapshot both ways along each of the main X, Y and Z directions with a perfect 90 degree field of view. The resulting 6 images can then be tiled to form a distortion-free cube map.

Rendering two different cube map faces. The red area is the camera's viewing cone/pyramid, which extends out to infinity.

To get started I built a very simple prototype, using Ogre's scene manager facilities. I'm starting with just a simple, smooth crater/dent brush. I generate all 6 faces in sequence on the GPU, pull the images back to the CPU to create the actual mesh, and push the resulting chunks of geometry into GPU memory. This is only done once at the beginning, although the possibility is there to implement live updates as well.

Here's a demo showing a planet and the brushes that created it, hovering over the surface. I haven't implemented any shading yet, so I have to toggle back and forth to wireframe mode so you can see the dents made by the brushes:

The cubemap for this 'planet' looks like this when flattened. You can see that I haven't actually implemented real terrain carving, because brushes cause sharp edges when they overlap:

The narrow dent on the left gets distorted and angular where it crosses the cube edge. This is a normal consequence of the cubemapping, as it looks perfectly normal when mapped onto the sphere in the video.

Engine Tweaks

The demo above also incorporates a couple of engine improvements. With a real heightmap in place, I can implement real level-of-detail selection. That means the resolution of any terrain tile is decided based on how much detail would be lost if a simpler tile was used. The flatter a tile, the less detail is necessary. This ensures complex geometry is used only on those sections that really need it. This is great for visual fidelity, but causes a lot of geometry to pop up if sharp ridges are present in the terrain. In this case, my rendering engine was happily trying to push 700k triangles through the GPU per frame. While even my laptop GPU can actually do that at pretty smooth frame rates nowadays, some optimizations are in order to give me some breathing room.

The culprit was that I wasn't really doing any early removal of geometry that was hidden or otherwise out of frame. To fix that, I now do visibility checks together with the level-of-detail selection. First it checks if a chunk is over the horizon or not before considering it for selection. This is easy to calculate and eliminates a lot of unnecessary drawing, especially when looking straight down. If that first visibility check passes, I perform a tighter check using the camera's viewing cone. With these two measures in place, I'm only averaging about 50,000-100,000 triangles visible per frame, with room for more optimization. These optimizations only remove geometry that's already off screen, so there is no visual difference.

Cubemap Seams and Dilation

When rendering into cube maps, each side is rendered independently. In theory each face should match perfectly with adjacent ones due to the way they've been created. In practice however, slight mismatches can occur due to rounding errors at the edges, creating seams. This can be fixed by explicitly copying one pixel-wide edges from one face into the adjacent ones, until they all match up.

The next big step is to start shading the surface, but in order to do that I need to be able to run filters on the cube map. Specifically, I need to be able to compare neighbouring height samples anywhere on the surface. In the straight forward cubemap scenario this is non-trivial, because neighbouring samples at the edges need to be fetched from different cube faces at different orientations in space.

I decided to implement something I call 'dilated cubemaps'. I've never really heard this described formally, though I doubt it's never been thought of before:

Instead of every face neatly matching with the next, I dilate the cube faces so they stick through eachother. At the same time, I use a larger texture size to compensate, and I adjust the field-of-view of the rendering camera to match. If done right, the resulting cubemap is a pixel-perfect expanded version of the undilated map.

The dilated cubemap provides reliable neighbouring samples for all samples in the original cube map up to a distance as wide as the new border. Unlike regular cubemap wrapping, the dilated regions are distorted to conform to the current face's grid. This matches the real change in grid direction that occurs on the final sphere mesh and lets you sample exactly across cube map edges.

I played with the cubemap dilation because I was thinking of some complicated filters to run that require regular grids (like CFD). But in retrospect, I probably don't need the exact spacing of sample points at the edges for this, so regular undilated cubemapping will probably do. Still, it's good to have around, and certainly was an interesting exercise in pixel-exact rendering.

What Next?

With basic heightmap generation in place, I can now start putting in some 'tech artist' time to play with various brushes and drawing behaviour. Lighting and shading is another big one and should provide a massive improvement to the visuals.

Right now I've taken a week between postings, though it remains to be seen whether I can maintain that. Creating these blog entries is turning into a pretty time consuming endeavour, especially as I get into territory where I have to make my own diagrams and illustrations.

References

The techniques I used were pioneered by people smarter and older than me, I'm just building my own little digital machine with them.

Creating Spherical Worlds, Maxis/Electronic Arts. (source).
Ken Perlin, who invented a lot of this stuff.

↧

On TermKit

February 5, 2012, 1:08 pm

≫ Next: Making Worlds 4 - The Devil's in the Details

≪ Previous: Making Worlds 2 - Scaling Heights

On TermKit

I've been administering Unix machines for many years now, and frankly, it kinda sucks. It makes me wonder, when sitting in front of a crisp, 2.3 million pixel display (i.e. a laptop) why I'm telling those pixels to draw me a computer terminal from the 80s.

And yet, that's what us tech nerds do every day. The default Unix toolchain, marked in time by the 1970 epoch, operates in a world where data is either binary or text, and text is displayed in monospace chunks. The interaction is strictly limited to a linear flow of keystrokes, always directed at only one process. And that process is capable of communicating only in short little grunts of text, perhaps coalescing into a cutesy little ASCII art imitation of things that grown-ups call "dialogs", "progress bars", "tables" and "graphs".

The Unix philosophy talks about software as a toolset, about tiny programs that can be composed seamlessly. The principles are sound, and have indeed stood the test of time. But they were implemented in a time when computing resources were orders of magnitude smaller, and computer interaction was undiscovered country.

In the meantime, we've gotten a lot better at displaying information. We've also learned a lot of lessons through the web about data interchange, network transparency, API design, and more. We know better how small tweaks in an implementation can make a world of difference in usability.

And yet the world of Unix is rife with jargon, invisible processes, traps and legacy bits. Every new adept has to pass a constant trial by fire, of not destroying their system at every opportunity it gives them.

So while I agree that having a flexible toolbox is great, in my opinion, those pieces could be built a lot better. I don't want the computer equivalent of a screwdriver and a hammer, I want a tricorder and a laser saw. TermKit is my attempt at making these better tools and addresses a couple of major pain points.

I see TermKit as an extension of what Apple did with OS X, in particular the system tools like Disk Utility and Activity Monitor. Tech stuff doesn't have to look like it comes from the Matrix.

Rich Display

It's 2011, and monospace text just doesn't cut it anymore. In the default ANSI color palette, barely any of the possible color combinations are even readable. We can't display graphs, mathematical formulas, tables, etc. We can't use the principles of modern typography to lay out information in a readable, balanced way.

So instead, I opted for a front-end built in WebKit. Programs can display anything that a browser can, including HTML5 media. The output is built out of generic widgets (lists, tables, images, files, progress bars, etc.). The goal is to offer a rich enough set for the common data types of Unix, extensible with plug-ins. The back-end streams display output to the front-end, as a series of objects and commands.

I should stress that despite WebKit it is not my intent to make HTML the lingua franca of Unix. The front-end is merely implemented in it, as it makes it instantly accessible to anyone with HTML/CSS knowledge.

Pipes

Unix pipes are anonymous binary streams, and each process comes with at least three: Standard In, Standard Out and Standard Error. This corresponds to the typical Input > Processing > Output model, with an additional error channel. However, in actual usage, there are two very different scenarios.

One is the case of interactive usage: a human watches the program output (from Std Out) on a display, and types keystrokes to interact with it (into Std In). Another case is the data processing job: a program accepts a data stream in a particular format on Std In, and immediately outputs a related data stream on Std Out. These two can be mixed, in that a chain of piped commands can have a human at either end, though usually this implies non-interactive operation.

These two cases are shoehorned into the same pipes, but happen quite differently. Human input is spontaneous, sporadic and error prone. Data input is strictly formatted and continuous. Human output is ambiguous, spaced out and wordy. Data output is conservative and monolithic.

As a result, many Unix programs have to be careful about data. For example, many tools dynamically detect whether they are running in interactive mode, and adjust their output to be more human-friendly or computer-friendly. Other tools come with flags to request the input/output in specific formats.

This has lead to "somewhat parseable text" being the default interchange format of choice. This seems like an okay choice, until you start to factor in the biggest lesson learned on the web: there is no such thing as plain text. Text is messy. Text-based formats lie at the basis of every SQL injection, XSS exploit and encoding error. And it's in text-parsing code where you'll likely find buffer overflows.

What this means in practice is that in every context, there are some forbidden characters, either by convention or by spec. For example, no Unicode or spaces in filenames. In theory, it's perfectly fine, but in practice, there's at least one shell script on your system that would blow up if you tried. Despite the promise of text as the universal interchange format, we've been forced to impose tons of vague limits.

So how do we fix this? By separating the "data" part from the "human" part. Then we can use messy text for humans, and pure data for the machines. Enter "Data In/Out", "View In/Out".

The data pipes correspond to the classical Std pipes, with one difference: the stream is prefixed with MIME-like headers (Content-Type, Content-Length, etc). Of these, only the 'Content-Type' is required. It allows programs to know what kind of input they're receiving, and handle it graciously without sniffing. Aside from that, the data on the pipe is a raw binary stream.

The view pipes carry the display output and interaction to the front-end. Widgets and UI commands are streamed back and forth as JSON messages over the view pipes.

The real magic happens when these two are combined. The last dangling Std Out pipe of any command chain needs to go into the Terminal, to be displayed as output. But the data coming out of Data Out is not necessarily human-friendly.

Thanks to the MIME-types, we can solve this universally. TermKit contains a library of output formatters which each handle a certain type of content (text, code, images, ...). It selects the right formatter based on the Content-Type, which then generates a stream of view updates. These go over the View Out pipe and are added to the command output.

As a result, you can cat a PNG and have it just work. TermKit cat doesn't know how to process PNGs or HTML—it only guesses the MIME type based on the filename and pipes the raw data to the next process. Then the formatter sends the image to the front-end. If you cat a source code file, it gets printed with line numbers and syntax highlighting.

So where does "somewhat parseable text" fit in? It turns out to be mostly unnecessary. Commands like ls output structured data by nature, i.e. a listing of files from one or more locations. It makes sense to pipe around this data in machine-form. Output flags like ls -l become mere hints for the final display, which can toggle on-the-fly between compact and full listing.

In TermKit's case, JSON is the interchange format of choice. The Content-Type for file listings is application/json; schema=termkit.files. The schema acts as a marker to select the right output plug-in. In this case, we want the file formatter rather than the generic raw JSON formatter.

Isn't JSON data harder to work with than lines of text? Only in some ways, but parsing JSON is trivial these days in any language. Because of this, I built TermKit grep so it supports grepping JSON data recursively. This happens transparently when the input is application/json instead of text/plain. As a result ls | grep works as you'd expect it to.

To slot in traditional Unix utilities in this model, we can pipe their data as application/octet-stream to start with, and enhance specific applications with type hints and wrapper scripts.

Finally, having type annotations on pipes opens up another opportunity: it allows us to pipe in HTTP GET / POST requests almost transparently. Getting a URL becomes no different from catting a file, and both can have fancy progress bars, even when inside a pipe chain like get | grep.

Synchronous interaction

All interaction in a traditional terminal is synchronous. Only one process is interactive at a time, and each keystroke must be processed by the remote shell before it is displayed. This leads to an obvious daily frustration: SSH keystroke lag.

To fix this, TermKit is built out of a separate front-end and back-end. The front-end can run locally, controlling a back-end on a remote machine. The connection can be tunneled over SSH for security.

Architecture diagram (TK stands for TermKit)

Additionally, all display updates and queries are asynchronous. The WebKit-based HTML display is split up into component views, and the view pipes of each subprocess are routed to their own view. Vice-versa, any interactive widgets inside a view can send callback messages back to their origin process, as long as it's still running.

This also allows background processes to work without overflowing the command prompt.

String-based command line

A lot of my frustration comes from bash's arcane syntax. It has a particularly nasty variant of C-style escaping. Just go ahead and try to match a regular expression involving both types of quotes.

But at its core, a bash command is a series of tokens. Some tokens are single words, some are flags, some are quoted strings, some are modifiers (like | and >). It makes sense for the input to reflect this.

TermKit's input revolves around tokenfield.js, a new snappy widget with plenty of tricks. It can do auto-quoting, inline autocomplete, icon badges, and more. It avoids the escaping issue altogether, by always processing the command as tokens rather than text. Keys that trigger special behaviors (like a quote) can be pressed again to undo the behavior and just type one character.

The behaviors are encoded in a series of objects and regexp-based triggers, which transform and split tokens as they are typed. That means it's extensible too.

Usability

At the end of the day, Unix just has bad usability. It tricks us with unnecessary abbreviations, inconsistent arguments (-r vs -R) and nitpicky syntax. Additionally, Unix has a habit of giving you raw data, but not telling you useful facts, e.g. 'r-xr-xr-x' instead of "You can't touch this" (ba-dum tsshh).

One of the Unix principles is nobly called "Least Surprise", but in practice, from having observed new Unix users, I think it often becomes "Maximum Confusion". We should be more pro-active in nudging our users in the right direction, and our tools should be designed for maximum discoverability.

For example, I want to see the relevant part of a man page in a tooltip when I'm typing argument switches. I'd love for dangerous flags to be highlighted in red. I'd love to see regexp hints of possible patterns inline.

There's tons to be done here, but we can't do anything without modern UI abilities.

Focus and Status

With a project like TermKit, it's easy to look at the shiny exterior and think "meh", or that I'm just doing things differently for difference's sake. But to me, the real action is under the hood. With a couple of tweaks and some uncompromising spring cleaning, we can get Unix to do a lot more for us.

The current version of TermKit is just a rough alpha, and what it does is in many ways just parlour tricks compared to what it could be doing in a few months. The architecture definitely supports it.

I've worked on TermKit off and on for about a year now, so I'd love to hear feedback and ideas. Please go check out the code.

TermKit owes its existence to Node.js, Socket.IO, jQuery and WebKit. Thanks to everyone who has contributed to those projects.

Edit, a couple of quick points:

A Linux port will definitely happen, since it's built out of WebKit and Node.js. Whoever does it first gets a cookie.
TermKit is not tied to JSON except in its own internal communication channels. TermKit Pipes can be in any format, and old-school plain-text still works. JSON just happens to be very handy and very lightweight.
The current output is just a proof of concept and lacks many planned usability enhancements. There are mockups on github.
If you're going to tell me I'm stupid, please read all the other 100 comments doing so first, so we can keep this short for everyone else.

Edit, random fun:

Someone asked for AVS instead of TermKit in the comments... best I could do was JS1K with a PDF surprise:

↧

Making Worlds 4 - The Devil's in the Details

February 5, 2012, 1:08 pm

≫ Next: A Useful BitTorrent Analogy

≪ Previous: On TermKit

Making Worlds 4 - The Devil's in the Details

Last time I'd reached a pretty neat milestone: being able to render a somewhat realistic rocky surface from space. The next step is to add more detail, so it still looks good up close.

Adding detail is, at its core, quite straightforward. I need to increase the resolution of the surface textures, and further subdivide the geometry. Unfortunately I can't just crank both up, because the resulting data is too big to fit in graphics memory. Getting around this will require several changes.

Strategy

Until now, the level-of-detail selection code has only been there to decide which portions of the planet should be drawn on screen. But the geometry and textures to choose from are all prepared up front, at various scales, before the first frame is started. The surface is generated as one high-res planet-wide map, using typical cube map rendering:

This map is then divided into a quad-tree structure of surface tiles. It allows me to adaptively draw the surface at several pre-defined levels of detail, in chunks of various sizes.

Quadtree terrain Source

This strategy won't suffice, because each new level of detail doubles the work up-front, resulting in exponentially increasing time and memory cost. Instead, I need to write an adaptive system to generate and represent the surface on the fly. This process is driven by the Level-of-Detail algorithm deciding if it needs more detail in a certain area. Unlike before, it will no longer be able to make snap decisions and instant transitions between pre-loaded data: it will need to wait several frames before higher detail data is available.

Configuration of chunks to render

Uncontrolled growth of increasingly detailed tiles is not acceptable either: I only wish to maintain tiles useful for rendering views from the current camera position. So if a specific detailed portion of the planet is no longer being used—because the camera has moved away from it—it will be discarded to make room for other data.

Generating Individual Tiles

The first step is to be able to generate small portions of the surface on demand. Thankfully, I don't need to change all that much. Until now, I've been generating the cube map one cube face at a time, using a virtual camera at the middle of the cube. To generate only a portion of the surface, I have to narrow the virtual camera's viewing cone and skew it towards a specific point, like so:

This is easy using a mathematical trick called homogeneous coordinates, which are commonly used in 3D engines. This turns 2D and 3D vectors into respectively 3D and 4D. Through this dimensional redundancy, we can then represent most geometrical transforms as a 4x4 matrix multiplication. This covers all transforms that translate, scale, rotate, shear and project, in any combination. The right sequence (i.e. multiplication) of transforms will map regular 3D space onto the skewed camera viewing cone.

Given the usual centered-axis projection matrix, the off-axis projection matrix is found by multiplying with a scale and translate matrix in post-projection "screen space". The thing with homogeneous coordinates is that it seems like absolute crazy talk until you get it. I can only recommend you read a good introduction to the concept.

With this in place, I can generate a zoomed height map tile anywhere on the surface. As long as the underlying brushes are detailed enough, I get arbitrarily detailed height textures for the surface. The normal map requires a bit more work however.

Normals and Edges

As I described in my last entry, normals are generated by comparing neighbouring samples in the height map. At the edges of the height map texture, there are no neighbouring samples to use. This wasn't an issue before, because the height map was a seamless planet-wide cube map, and samples were fetched automatically from adjacent cube faces. In an adaptive system however, the map resolution varies across the surface, and there's no guarantee that those neighbouring tiles will be available at the desired resolution.

The easy way out is to make sure the process of generating any single tile is entirely self-sufficient. To do this, I expand each tile with a 1 pixel border when generating it. Each such tile is a perfectly dilated version of its footprint and overlaps with its neighbours in the border area:

This way all the pixels in the undilated area have easily accessible neighbour pixels to sample from. This border is only used during tile generation, and cropped out at the end. Luckily I did something similar when I played with dilated cube maps before, so I already had the technique down. When done correctly, the tiles match up seamlessly without any additional correction.

Adaptive Tree

Now I need to change the data structure holding the mesh. To make it adaptive, I've rewritten it in terms of real-time 'split' and 'merge' operations.

Just like before, the Level-of-Detail algorithm traverses the tree to determine which tiles to render. But if the detail available is not sufficient, the algorithm can decide that a certain tile in the tree needs a more detailed surface texture, or that its geometry should be split up further. Starting with only a single root tile for each cube face, the algorithm divides up the planet surface recursively, quickly converging to a stable configuration around the camera.

As the camera moves around, new tiles are generated, increasing memory usage. To counter this steady stream of new data, the code identifies tiles that fall into disuse and merges them back into their parent. The overall effect is that the tree grows and shrinks depending on the camera position and angle.

Queuing and scheduling

To do all this real-time, I need to queue up the various operations that modify the tree, such as 'split', 'merge' and 'generate new tile'. They need to be executed in between rendering regular frames on screen. Whenever the renderer decides a certain tile is not detailed enough, a request is placed in a job queue to address this.

While continuing to render regular frames, these requests need to be processed. This is harder than it sounds, because both planet rendering and planet generation have to share the GPU, preferably without causing major stutters in rendering speed.

The solution is to spread this process over enough visible frames so that the overal rendering speed is not significantly affected. For example, if a new surface texture is requested, several passes are made. First the height map is rendered, the next frame the normal map is derived from it, then the height/normal maps are analyzed and put into the tree, after which they will finally appear on screen:

I took some inspiration from id Tech 5, the next engine coming from technology powerhouse id Software. They describe a queued job system that covers any frame-to-frame computation in a game engine (from texture management to collision detection), and which schedules tasks intelligently.

Do the Google Earth

With all the above in place, the engine can now progressively increase the detail of the planet across several orders of magnitude. Here's a video that highlights it:

And some shots that show off the detail:

W00t, certainly one of the niftiest things I've built.

Engine tweaks

Along with the architecture changes, I implemented some engine tweaks, noted here for completeness.

In previous comments, Erlend suggested using displacement mapping, so I gave it a shot. Before, the mesh for every tile was calculated on the CPU once, then copied into GPU memory. However, this mesh data was redundant, because it was derived literally from the height map data. Instead I changed it so that now, the transformation of mesh points onto the sphere surface happens real-time on the GPU in a per-vertex program.

This saves memory and pre-calculation time, but increases the rendering load. I'll have to see whether this technique is sustainable, but overall, it seems to be performing just fine. As a side effect, the terrain height map can be changed real-time with very low cost.

Technical hurdles

I spent some time tweaking the engine to run faster, but there's still plenty of work and some technical hurdles to cover.

One involves the Ogre Scene Manager, which is the code object that manages the location of objects in space. In my case, I have to deal with both the 'real world' in space as well as the 'virtual world' of brushes that generate the planet's surface. I chose to use two independent scene managers to represent this, as it seemed like a natural choice. However, it turns out this is unsupported by Ogre and causes random crashes and edge cases. Argh. It looks like I'll have to refactor my code to fix this.

Another major hurdle involves the planet surface itself. Currently I'm still just using a single distored-crater-brush to create it, and the lack of variation is showing.

Finally, surfaces are being generated using 16-bit floating point height values, and their accuracy is not sufficient beyond a couple levels of zooming. This results in ugly bands of flat terrain. To fix this I'll need to increase the surface accuracy.

Future steps

With the basic planet surface covered, I can now start looking at color, atmosphere and clouds. I have plenty of reading and experimentation to do. Thankfully the web is keeping me supplied with a steady stream of awesome papers... nVidia's GPU Gems series has proven to be a gold mine, for example.

Random factoid: what game developers call a "cube map", cartographers call a "cubic gnomonic grid". It turns out that knowing the right terminology is important when you're looking for reference material...

Code

The code is available on GitHub.

References

Great ideas are best discovered when standing on the shoulders of giants:

id Tech 5 Challenges, From Texture Virtualization to Massive Parallelization, J.M.P. van Waveren, id Software
GPU Gems, nVidia

↧

A Useful BitTorrent Analogy

July 23, 2012, 9:34 am

≫ Next: This is Your Brain on CSS

≪ Previous: Making Worlds 4 - The Devil's in the Details

A Useful BitTorrent Analogy

BitTorrent has been around for over a decade now. And yet, when mentioned in the media, it's pretty much universally associated with piracy and illegal file sharing.

Just the other day, I saw a journalist write proudly: "No, I don't have a Torrent program and I'm not downloading one." A journalist! Someone who is supposed to be an expert at retrieving information and sharing it!

BitTorrent is not scary, and more so it actually generates the majority of traffic on the internet. In the 21st century it should be a tool that sits on your digital utility belt, not something you wouldn't touch with a 10 foot pole. So here's a simple analogy to help understand it.

· • ·

Imagine a budget-starved teacher needs to hand out notes for class, but can only afford one copy. The document is 10 pages long, and there are 10 students who each need a complete copy.

The teacher could just give the notes to one student, and ask him to make all the copies, but that would only shift the burden, leaving him to pay for all 100 pages.

Instead, the teacher has an idea. She hands page 1 to student #1, page 2 to student #2, and so on, and tells each student to make 10 copies of their single page. The next week, the students can distribute them amongst themselves before class, and everyone gets a complete set. Nobody has to pay for more than their own 10 pages.

Everyone's happy: the teacher gets to share her knowledge cheaply, and the students don't mind paying for their own copies.

In the middle of the term, a new student joins. She could borrow someone else's big pile of notes, and copy the entire stack of paper, but that would mean she would have to pay for it all, and she's on a budget too.

So instead, she just goes around and asks each student to make a single copy of the pages they were assigned previously. The next week, she collects all the pages, and assembles a complete copy without even bothering the teacher.

She gets a free pass to catch up with the class, but the other students don't mind chipping in. That's because she immediately joins the game and can make copies too. The teacher can now hand out one page extra each week, or decide to give one student a free pass. If more students join, it works better and better.

Now instead, imagine that students join and leave the class every single day, and the teacher isn't quite so organized. She just puts her big stack of notes on the desk, and tells everyone they can take any page they want, as long as they promise to immediately make copies for anyone who asks. The students are all friendly, and make sure to keep each other in the loop about which pages everyone has. Both the originals and the copies are copied as many times as needed.

· • ·

That's BitTorrent in a nutshell. For any given class—i.e. a file that people are interested in—a cloud of students forms—i.e. the peers in the so called peer-to-peer network. The peers compare notes, see which pieces they are missing, and swap copies with each other. Eventually, the teacher (a.k.a. the seeder) can leave, taking her original copy with her, and the system will keep working. As long as there is at least one copy of every page in the room, the students can make more, and the document as a whole will live on.

This is pretty much the only way you can effectively distribute a massive archive of sensitive data to thousands or millions of people, without incurring massive bills. You can't use free or ad-supported services, as the material would get taken down instantly due to its sensitive nature. And you can't host it directly, as that would leave a trail pointing back to you.

With BitTorrent, your initial group of 'students' can be sworn to secrecy. After the initial round of copying, the teacher sneaks out, and the students just pin a notice on the bulletin board: "We have copies of The Forbidden Secrets by Dr. X. Come see us." Nobody claims to know who Dr. X is. Ideas and information flow freely, without censorship.

↧

This is Your Brain on CSS

July 23, 2012, 9:34 am

≫ Next: Introducing Facing.me

≪ Previous: A Useful BitTorrent Analogy

This is Your Brain on CSS

First things first: the CSS 3D renderer used to power this site is now available on GitHub.com. However, it's still limited to only solid lines and planes. It's also limited to WebKit browsers, as Firefox's CSS 3D support just isn't quite there yet.

But CSS 3D is not a one trick pony, and as with many things, what you get out of it depends entirely on what you put in. So here's a disembodied head made out of CSS 3D. It consists of nothing more than a bunch of images stacked up against each other, and integrates perfectly with the existing 3D parallax on this site. Click and drag to rotate, or use the slider to look inside.

Making the basic effect was actually quite easy. I took an MRI from the Stanford Volume Data Archive and wrote a small script to turn it into a sheet of CSS sprites. There's one file for color, one for opacity, totalling about 2.1 MB. Both files are composited into Canvases and placed in slices into the DOM, offset forward or backwards in 3D. Then there's just some minor logic to rotate the slices in 90 degree increments to follow the camera.

But the slices are rendered as is, and the MRI consists of boring grayscale data. Luckily, I can precompute any amount of shaders and effects I want and just bake them into the slices. I geeked out by applying fake specular lighting, for that 'fresh meat' look, and volumetric obscurance to enhance the sense of depth on the inside. I changed the palette to gory colors based on local density, giving the impression of flesh and bone knitting itself together. Creepy, but cool.

I wrapped it in a custom widget, using straight up CSS rather than Three.js this time. I've wanted to play with Tangle.js, so I used that to hook up the camera controls and slider. That's pretty much it. In an ideal world, the jarring transition when rotating would be covered up by a nice transition, but the browsers don't like it.

↧

Introducing Facing.me

July 23, 2012, 9:34 am

≫ Next: Going Full Frontal

≪ Previous: This is Your Brain on CSS

Introducing Facing.me

A unique way to meet people

We've been sending out whispers for a while now, but it's finally out: a new web site called Facing.me. Coded and designed by Michael Holly, Ross Howard-Jones and myself, it promises a unique way to meet people online. This would be the point where the obvious question is dropped: wait, what… you built a dating site?

Sort of. Let me explain.

Having spent many years in the web world, we'd all gotten a bit complacent. The web has settled into its comfortable rhythms. Sites and applications can be modelled quickly and coded on your framework of choice. And nowadays, Web 2.0 cred comes baked in: clean URLs, semantic HTML, AJAX, data feeds, APIs, etc. Isn't this what we all wanted?

But the web continues to evolve, and giants are roaming the playground. Sites like Facebook and Twitter hold people's attention with surgical precision, while engines like Google answer your queries with lightning speed. Given that we've all slotted such services into our workflows and indeed lives, it seems only natural that 'indie' developers should keep up. We can't pretend that a 2000-era style web-page-with-ajax-sprinkles is the pinnacle of modern interactive design.

So we set out to try something different.

A Guy Walks into a Bar...

If you've managed to score an invite, the first thing you'll see is the wall of faces that loads and fills the screen. The second thing you'll notice—we hope at least—is the lack of everything else.

The metaphor we kept in mind was the idea of walking into a bar, and looking around. If you see someone you like, you can go up to them and strike up a conversation. So that's exactly what the app lets you do, through video chat. You can pan around to see more people, and just keep going. If you're looking for something specific, you can filter your view with a simple "I'm looking for…" dialog.

As you mouse around, you can see who's online, and flip open their profile. If you want to strike up a video chat, it happens right there too. If the person is online, they'll see your request immediately in a popup and can choose to accept or decline after reviewing your profile. If they're offline, they'll see your request next time they visit.

To avoid missed connections, you can 'like' people you're interested in. You'll see (and hear) a notification pop up the moment they're online. You can keep the app open in a background tab and never miss a thing.

Aside from some minor social glue and a few fun little extras for you to discover, that's it. It's our twist on a minimally viable product if you will. Studies have shown that online matching algorithms are a poor predictor for how well people mesh in person. Until you meet face-to-face, you just don't know. We think direct, spontaneous video chat is a better first step rather than endless profile matching and messaging.

Polishing Bacon

But despite its minimalism, a big aspect of Facing.me is the effort and care we put into it. Our goal was to achieve a level of polish typically reserved for premium iPhone apps and bring it into the browser. We wrapped the whole thing in a crisp design, enhanced with tasteful web fonts. But most importantly, we sought to expose the app's functionality with as little interruption as possible. To do that, we layered on plenty of transitions driven by CSS3 and JavaScript, and stream in data and content as needed.

Based on previous work in custom animations—and bacon—we refined the approach of using jQuery as an animation helper for completely custom transitions. We tell jQuery to animate placeholder properties on orphaned proxy divs, and key off those animations with per-frame code to drive the fancy stuff.

As a result, we can have a photo grow a picture frame as you pick it up, and then flip it around to show a person's full profile. This careful choreography involves animating about a dozen CSS properties, including borders, shadows, margins and 3D transforms, all with custom expressions and hand-tuned animation curves. Similar transitions are used for lightbox dialogs.

Throughout all of this, the animations remain eminently manageable. We can interrupt and reverse them at any point, and run multiple copies at the same time, thanks to pervasive use of view controllers. Far from being a useless tech demo, it actually enables us to craft the user experience exactly the way we like it: being able to acknowledge user intentions with intuitive feedback no matter what's going on, and firing off new events and requests without worrying about the internal state. Gone are the fragile jQuery behavior soups of old.

The one downside is that only the newer browsers—i.e. Chrome, Safari and Firefox—get to see everything the way it was intended. And actually the performance in Firefox is still a bit disappointing. IE9 users will have to be satisfied with a crude 2D approximation until IE10 comes out.

Rapid Rails and Real-Time Node

To make all this work effectively on the server-side, we used a dual-mode stack of Rails and Node.js.

The Rails side houses the app's models and controllers, and provides an API for all the client-side JavaScript to do its job. Video chats are handled through Flash and routed through its built-in peer-to-peer functionality.

The node.js component acts as a real-time presence daemon which users connect to over socket.io. It's used to drive the status notifications and to coordinate the video chats. We can exchange any sort of notifications between users with a publish-subscribe model, opening up many interesting avenues for future development.

Overall, this approach has worked out great. Rails' ActiveRecord and the stack around it allowed us to build out functionality quickly and with just the right amount of necessary baggage. We made generous use of Ruby Gems to save time while still maintaining full control.

Node.js's event-driven model adds real-time signalling with no hassle. For the few cases where node.js needs to interface with the Rails database directly, we slot in some manual SQL to take care of that. For everything else, Rails and node.js exchange signed data through the browser.

Come Take it for a Spin

Finally, we also put our heads together and made a promo video, voiced by the lovely Tina Hoang:

Built in our spare time by just 3 guys in a virtual garage, we're pretty proud of the end result. We'd love for you to take it for a spin, so head over to facing.me and grab yourself an invite. There's a feedback form built-in, and any suggestions are welcome.

Discuss on Google Plus.

↧

Going Full Frontal

November 11, 2012, 12:00 am

≫ Next: Making MathBox

≪ Previous: Introducing Facing.me

Going Full Frontal

Making things with Maths

Last week, I had the privilege of speaking about "Making things with Maths" at Full Frontal, a tech conference hosted in a gorgeous picturehouse in the seaside town of Brighton, UK. I was nervous as hell: I hadn't attended a tech conference in ages, let alone taken the stage, and I'd never done a talk on this subject before. I'd been planning and working for months to assemble the code just to be able to show what I saw in my head—and of course scrambled to finish the week before regardless. This talk has been the number one thing on my mind for a while.

Yet two days later, I barely remember my own part in it, and find myself mulling over everything else that was said instead. It was simply too good, too provoking, not to think about.

The lovely duo of organizers, Remy and Julie Sharp, have crafted something very special. The line up was stellar: each and every speaker challenged my preconceptions with the kind of casualness that only in-depth experience can bring.

Arguments for abandoning the purity of vanilla HTML (James Pearce) were followed by a philosophical lesson on not throwing away the baby with the bathwater (John Allsopp) and I found myself agreeing wholeheartedly with both, cognitive dissonance notwithstanding. As someone who isn't a fan of the mobile app world, I had to admit I was ignorant on the difficulties of implementing offline web apps (Andrew Betts), and blissfully unaware of the absolute zoo of devices people really do try to access the web with (Anna Debenham), webological purity be damned.

We've barely scratched the surface of what browsers can do (Paul Kinlan), we need to chase the high of writing code and actually have it Just Work (Rebecca Murphey), and above all, we need to remember where it all came from (Chris Wilson), lest we repeat the mistakes of the past. If you weren't one of the lucky people who managed to snag tickets before it sold out, take some time out of your busy day to enjoy these sessions in video once they are posted online rather than just skipping through slides.

If there's one thing that stood out though, it's how little of what I heard on and off-stage is part of the daily discourse online in the tech world, in the news or on sites like HackerNews, Twitter and Reddit. We only see caricatures of these conversations. More than ever, I'm convinced I need to filter out these echochambers from my thoughts and seek out more substance. Particularly, the Silicon Valley-centric TechCrunch-driven worship of runaway success adds nothing, and only holds us back. It makes people think they need to chase something that only ever happens by accident, and diverts attention away from rolling up your sleeves and doing what actually needs to be done. This was emphasized all the more by the fact that the venue had no wi-fi, which meant everyone had their eyes away from their screens for a change.

To seal the deal, the conference was flanked by in-depth workshops, the obvious drinks and social gatherings and even a NodeCopter hackathon, where we made quadcopters do crazy things with nothing more than JavaScript. I only wish I'd been more rested and less jetlagged so I could've spoken to more folks the past few days.

Thank you to the organizers and volunteers, to the event crew, to my fellow speakers, to the people who travelled from near and far to listen, and to whomever decided to stick those funky legs on top of the cinema. They heralded quite literally that things were about to be turned upside down, and the event certainly delivered on that.

Video and Slides

You can read more about MathBox in the follow-up blog post.

The adventurous can go see how the sausage was made and check out the code for MathBox, the library I wrote to make it happen, as well as the HTML5 slide deck.

↧

Making MathBox

November 14, 2012, 12:00 am

≫ Next: How to Fold a Julia Fractal

≪ Previous: Going Full Frontal

Making MathBox

Presentation-Quality Math with Three.js and WebGL

For most of my life, I've found math to be a visual experience. My math scores went from crap to great once I started playing with graphics code, found some demoscene tutorials, and realized I could reason about formulas by picturing the graphs they create. I could apply operators by learning how they morph, shift, turn and fold those graphs and create symmetries. I could remember equations and formulas more easily when I could layer on top the visual relationships they embody. I was less likely to make mistakes when I could augment the boring symbolic manipulation with a mental set of visual cross-checks.

So, when tasked with holding a conference talk on how to make things out of math at Full Frontal, I knew the resulting presentation would have to consist of intricate visualizations as the main draw, with whatever I had to say as mere glue to hold it together.

The problem was, I didn't know of a good tool to do so, and creating animations by hand would probably be too time consuming. With the writings of Paul Lockhart and Bret Victor firmly in mind, I also knew I wanted to start blogging more about mathematical concepts in a non-traditional way, showing the principles of calculus, analysis and algebra the way I learnt to see them in my head, rather than through the obscure symbols served up in engineering school.

So I set out to create that tool, keeping in mind the most important lesson I've picked up as a web developer: one cannot understate the value in being able to send someone a link and have it just work, right there. It was obvious it would have to be browser-based.

Choose your Poison

Now, when people think of graphs in a browser, the natural thought is vector graphics and SVG, which quickly leads to visualization powerhouse d3.js. And it cannot be understated: d3.js really is an amazing piece of tech with a vast library of useful code to accompany it. When I wrapped my head around how d3's enter/exit selections are implemented and how little it actually does to achieve so much, I was blown away. It's just so elegant and simple.

Unfortunately, d3's core is intricately tied to the DOM through SVG and CSS. And that means ironically that d3 is not really capable of 3D. Additionally, d3 is a power tool that makes no assumptions: it is up to you to choose which visual elements and techniques to use to make your diagrams, and as such it is more like assembly language for graphs than a drop-in tool. These two were show stoppers.

For one, manually designing layouts, grids, axes, etc. every time is tedious. You should be able to drop in a mathematical expression with as little fanfare as possible and have it come out looking right. This includes sane defaults for transitions and animations.

For another, I've found that, when in doubt, adding an extra dimension always helps. The moment I finally realized that every implicit graph in N dimensions is really just a slice of an explicit one in N+1 dimensions, a ridiculous amount of things clicked together. And it took until years after studying signal processing to at long last discover the 4D picture of complex exponentiation that tied the entire thing together (projected into 3D below): it revealed the famous "magic formula" involving e, i and π to be a meaningless symbological distraction, a pinhole view of a much larger, much more beautiful structure, underpinning every Fourier and Z transform I'd ever encountered.

So, WebGL it was, because I needed 3D. Unfortunately that meant the promise of having it just work everywhere was tempered by a lack of browser support, but I would certainly hope that's something we can overcome sooner than later. Dear Apple and Microsoft: get your shit together already. Dear Firefox and Opera: your WebGL performance could be a lot better.

Shady Dealings

These days I don't really touch WebGL without going through Three.js first. Three.js is a wonderful, mature engine that contains tons of useful high-level components. At the same time, it also does a great job in just handling the boilerplate of WebGL while not getting in the way of doing some heavy lifting yourself.

Rendering vector-style graphics with WebGL is not hard, certainly easier than photorealistic 3D. Primitives like lines and points are sized in absolute pixels by default, and with hardware multisampling for anti-aliasing, you get somewhat decent image quality out of it. Though, as is typical for a Web API, we're treated like children and can only cross our fingers and request anti-aliasing politely, hoping it will be available. Meanwhile native developers have full control over speed and quality and can adjust their strategy to the specific hardware's capabilities. The more things change... And then Chrome decided to disable anti-aliasing altogether due to esoteric security issues with buggy drivers. Bah.

Now, when rendering with WebGL, you really have two options. One is to just treat it as a dumb output layer, loading or generating all your geometry in JavaScript and rendering it directly in 3D. With the speed of JS engines today, this can get you pretty far.

The second option is to leverage the GPU's own capabilities as much as possible, doing computations in GLSL through so-called vertex and fragment shader programs. These are run for every vertex in a mesh, every pixel being drawn, and have been the main force driving innovation in real-time graphics for the past decade. With the goal of butter-smooth 60fps graphical goodness, this seemed like the better choice.

Unfortunately, GLSL shaders are rather monolithic things. While you do have the ability to create subroutines, every shader still has to be a stand-alone program with its own main() function. This means you either need to include a shader for every possible combination of operations, or generate shader code dynamically by concatenating pre-made snippets or using #ifdef switches to knock them out. This is the approach taken by Three.js, which results in some very hairy code that is neither easy to read nor easy to maintain.

Having made a prototype, I knew I wanted to show continuous transitions between various coordinate systems (e.g. polar and spherical), knew I needed to render shaded and unshaded geometry, and knew I would need to slot in specific snippets for things like point sprites, bezier curves/surfaces, dynamic tick marks, and more. Sorting this all out Three.js-style would be a nightmare.

So I wrote a library to solve that problem, called ShaderGraph.js. It is best described as a smart code-concatenator, a few steps short of writing a full blown compiler. You feed it snippets of GLSL code, each with one or more inputs and outputs, and these get parsed and turned into lego-like building blocks. Each input/output becomes an outlet, and outlets are wired up in a typical dataflow style. Given a graph of connected snippets, it can be compiled back into a program by assembling the subroutines, assigning intermediate variables and constructing an appropriate main() function to invoke them. It also exports a list of all external variables, i.e. GLSL uniforms and attributes, so you can control the program's behavior easily.

If I'd stopped there however, I'd have just replaced the act of manual code writing with that of manually wiring graphs. So I applied the principle of convention-over-configuration instead: you tell ShaderGraph to connect two snippets, and it will automatically match up outlets by name and type. This is augmented by a chainable factory API, which allows you to pass a partially built graph around. It allows different classes to work together to build shaders, each inserting their own snippets into the processing chain. It really works like magic and I can't wait to use this in my next WebGL projects.

Viewports, Primitives and Renderables

At its core, Three.js matches pretty directly with WebGL. You can insert objects such as a Mesh, Line or ParticleSystem into your scene, which invokes a specific GL drawing command with high efficiency. As such, I certainly didn't want to reinvent the wheel.

Hence, MathBox is set up as a sort of scene-manager-within-a-scene-manager. It's a little sandbox that speaks the language of math, allowing you to insert various primitives like curves, vectors, axes and grids. Each of these primitives then instantiates one or more renderables, which simply wrap a native Three.js object and its associated ShaderGraph material. Thus, once instantiated, MathBox gets out of the way and Three.js does the heavy lifting as normal. You can even insert multiple mathboxen into a Three.js scene if you like, mixed in with other objects.

MathBox Architecture

For example, a vector primitive is rendered as an arrow: it consists of a shaft and an arrowhead, realized as a line segment and a cone. An axis primitive is an arrow as well, but it also has tick marks (specially transformed line segments), and is positioned implicitly just by specifying the axis' direction rather than a start and end point.

To render curves and surfaces, you can either specify an array of data points or a live expression to be evaluated at every point. This turned out to be essential for the kinds of intricate visualizations I wanted to show, my slides being driven by timed clocks, shared arrays of data points, and live formulas and interpolations. I even fed in data from a physics engine, and it worked perfectly.

This is all tied together through Viewport objects, which define a specific mapping from a mathematical coordinate space into the 3D world space of Three.js. For example, the default cartesian viewport has the range [–1, 1] in the X, Y and Z directions. Altering the viewport's extents will shift and scale anything rendered within, as well as reflow grids and tick marks on each axis.

There are two more sophisticated viewport types, polar and spherical, which each apply the relevant coordinate transform, and can transition smoothly to and from cartesian. More viewport types can be added, all that is required is to define an appropriate transformation in JavaScript and GLSL. That said, defining a seamless transition to and from cartesian space is not always easy, particularly if you want to preserve the aspect-ratio through the entire process.

Interpolate all the things!

Finally, I had to tackle the problem of animation, keeping in mind a tip I learnt from the ever so mindbending Vihart: "If I can draw the point of a sentence, I don't actually need to say the sentence." This applies doubly so for animation: every time you replace a "before" and "after" with a smooth transition, your audience implicitly understands the change rather than having to go look for it.

Hence, each primitive can be fully animated. Each has a set of options (controlling behavior) and styles (controlling GLSL shaders), and there is a universal animator that can interpolate between arbitrary data types in a smart fashion.

For example, given a viewport with the XYZ range [[–1, 1], [–1, 1], [–1, 1]], you can tell it to animate to [[0, 2], [0, 1], [–3, 3]], and it just works. The animator will recursively animate each subarray's elements, and any dependent objects like grids and axes will reflow to match the intermediate values. This works for colors, vectors and matrices too. In case of live curves with custom expressions, the animator will invoke both the old and the new, and interpolate between the results.

However, executing animations manually in code is tedious, particularly in a presentation, where you want to be able to step forward and backward. So I added a Director class whose job it is to coordinate things. All you do is feed it a script of steps (add this object, animate that object). Then, as it applies them, it remembers the previous state of each object and generates an automatic rollback script. It also contains logic to detect rapid navigation, and will hurry up animations appropriately, avoiding that agonizing situation of watching someone skip through their slide deck, playing the same cheesy PowerPoint transitions over and over again.

Presenting Naturally

With MathBox's core working, it was time to build my slides for the conference. After a quick survey, I quickly settled on deck.js as an HTML5 slidedeck solution that was clean and flexible enough for my purposes. However, while MathBox can be spawned inside any DOM element, it wouldn't work to insert a dozen live WebGL canvases into the presentation. The entire thing would grind to a halt or at least become very choppy.

So instead, I integrated each MathBox graphic as an IFRAME, and added some logic that only loads each IFRAME one slide before it's needed, and unloads it one slide after it's gone off screen. To sync up with the main presentation, all deck.js navigation events were forwarded into each active IFRAME using window.postMessage. With the MathBox Director running inside, this was very easy to do, and meant that I could skip around freely during the talk, without any worries of desynchronization between MathBox and the associated HTML5 overlays.

In fact, I applied the same principle to this post. To avoid rendering all diagrams simultaneously and spinning up laptop fans more than necessary, each MathBox IFRAME is started as it scrolls into view and stopped once it's gone.

I've also found that having a handheld clicker makes a huge difference while speaking—as it allows you to gesture freely and move around. So, I grabbed the infrared remote code from VLC and built a simple bridge from to Cocoa to Node.js to WebSocket to allow the remote to work in a browser. It's a shame Apple's decided to discontinue IR ports on their laptops. I guess I'll have to come up with a BlueTooth-based solution when I upgrade my hardware.

Towards MathBox 1.0

In its current state, MathBox is still a bit rough. The selection of primitives and viewports is limited, and only includes the ones I needed for my presentation. That said, it is obvious you can already do quite a lot with it, and I couldn't have been happier to hear that all this effort had the desired response at the conference. I wasn't 100% sure whether other people would have the same a-ha moments that I've had, but I'm convinced more than ever that seeing math in motion is essential for honing our intuition about it. MathBox not only makes animated diagrams much easier to make and share, but it also opens the door to making them interactive in the future.

I plan to continue to evolve MathBox as needed by using it on this site and addressing gaps that come up, though I've already identified a couple of sore points:

I used tQuery as a boilerplate and because I liked the idea of having a chainable API for this. However, this also means it's currently running off an outdated version of Three.js. I need to look into updating and/or dropping tQuery.
MathBox has been updated to Three.js r52.
Numeric or text labels are completely unsupported. It should be possible to use my CSS3D renderer for Three.js to layer on beautifully typeset MathJax formulas, positioning them correctly in 3D on top of the WebGL render.
I've added labeling for axes. I've integrated MathJax, but it's tricky because the typesetting is painfully slow in the middle of a 60fps render. But it's automatically used if MathJax is present.
All styles have to be specified on a per-object basis. Some form of stylesheet, default styles or class mechanism to allow re-use seems like an obvious next step.
There are undoubtedly memory leaks, as I was focused first and foremost on getting it to work.
Expressions that don't change frame-to-frame are still continuously re-evaluated, which is wasteful. There is a live: false flag you can set on objects, but it triggers a few bugs here and there.
There needs to be a predictable, built-in way of running a clock per slide to sync custom expressions off of. In my presentation I used a hack of clocks that start once first invoked, but this lacks repeatability.
I added a director.clock() method that gives you a clock per slide.

Finally, it doesn't take much imagination to imagine a MathBox Editor that would allow you to build diagrams visually rather than having to use code like I did. However, that's a can of worms I'm not going to open by myself, especially because the API is already quite straightforward to use, and the library itself is still a bit in flux. Perhaps this could be done as an extension of the Three.js editor.

You can see what MathBox is really capable of in the conference video. I invite you to play around with MathBox and see what you can make it do. Contributions are welcome, and the architecture is modular enough to allow its functionality to grow for quite some time.

↧

How to Fold a Julia Fractal

January 5, 2013, 12:00 am

≫ Next: To Infinity… And Beyond!

≪ Previous: Making MathBox

How to Fold a Julia Fractal

A tale of numbers that like to turn

"Take the universe and grind it down to the finest powder and sieve it through the finest sieve and then show me one atom of justice, one molecule of mercy. And yet," Death waved a hand, "And yet you act as if there is some ideal order in the world, as if there is some… some rightness in the universe by which it may be judged."
– The Hogfather, Discworld, Terry Pratchett

Mathematics has a dirty little secret. Okay, so maybe it's not so dirty. But neither is it little. It goes as follows:

Everything in mathematics is a choice.

You'd think otherwise, going through the modern day mathematics curriculum. Each theorem and proof is provided, each formula bundled with convenient exercises to apply it to. A long ladder of subjects is set out before you, and you're told to climb, climb, climb, with the promise of a payoff at the end. "You'll need this stuff in real life!", they say, oblivious to the enormity of this lie, to the fact that most of the educated population walks around with "vague memories of math class and clear memories of hating it."

Rarely is it made obvious that all of these things are entirely optional—that mathematics is the art of making choices so you can discover what the consequences are. That algebra, calculus, geometry are just words we invented to group the most interesting choices together, to identify the most useful tools that came out of them. The act of mathematics is to play around, to put together ideas and see whether they go well together. Unfortunately that exploration is mostly absent from math class and we are fed pre-packaged, pre-digested math pulp instead.

And so it also goes with the numbers. We learn about the natural numbers, the integers, the fractions and eventually the real numbers. At each step, we feel hoodwinked: we were only shown a part of the puzzle! As it turned out, there was a 'better' set of numbers waiting to be discovered, more comprehensive than the last.

Along the way, we feel like our intuition is mostly preserved. Negative numbers help us settle debts, fractions help us divide pies fairly, and real numbers help us measure diagonals and draw circles. But then there's a break. If you manage to get far enough, you'll learn about something called the imaginary numbers, where it seems sanity is thrown out the window in a variety of ways. Negative numbers can have square roots, you can no longer say whether one number is bigger than the other, and the whole thing starts to look like a pointless exercise for people with far too much time on their hands.

I blame it on the name. It's misleading for one very simple reason: all numbers are imaginary. You cannot point to anything in the world and say, "This is a 3, and that is a 5." You can point to three apples, five trees, or chalk symbols that represent 3 and 5, but the concepts of 3 and 5, the numbers themselves, exist only in our heads. It's only because we are taught them at such a young age that we rarely notice.

So when mathematicians finally encountered numbers that acted just a little bit different, they couldn't help but call them fictitious and imaginary, setting the wrong tone for generations to follow. Expectations got in the way of seeing what was truly there, and it took decades before the results were properly understood.

Now, this is not some esoteric point about a mathematical curiosity. These imaginary numbers—called complex numbers when combined with our ordinary real numbers—are essential to quantum physics, electromagnetism, and many more fields. They are naturally suited to describe anything that turns, waves, ripples, combines or interferes, with itself or with others. But it was also their unique structure that allowed Benoit Mandelbrot to create his stunning fractals in the late 70s, dazzling every math enthusiast that saw them.

Yet for the most part, complex numbers are treated as an inconvenience. Because they are inherently multi-dimensional, they defy our attempts to visualize them easily. Graphs describing complex math are usually simplified schematics that only hint at what's going on underneath. Because our brains don't do more than 3D natively, we can glimpse only slices of the hyperspaces necessary to put them on full display. But it's not impossible to peek behind the curtain, and we can gain some unique insights in doing so. All it takes is a willingness to imagine something different.

So that's what this is about. And a lesson to be remembered: complex numbers are typically the first kind of numbers we see that are undeniably strange. Rather than seeing a sign that says Here Be Dragons, Abandon All Hope, we should explore and enjoy the fascinating result that comes from one very simple choice: letting our numbers turn. That said, there are dragons. Very pretty ones in fact.

Like Hands on a Clock

What does it mean to let numbers turn? Well, when making mathematical choices, we have to be careful. You could declare that $ 1 + 1 $ should equal $ 3 $, but that only opens up more questions. Does $ 1 + 1 + 1 $ equal $ 4 $ or $ 5 $ or $ 6 $? Can you even do meaningful arithmetic this way? If not, what good are these modified numbers? The most important thing is that our rules need to be consistent for them to work. But if all we do is swap out the symbols for $ 2 $ and $ 3 $, we didn't actually change anything in the underlying mathematics at all.

So we're looking for choices that don't interfere with what already works, but add something new. Just like the negative numbers complemented the positives, and the fractions snugly filled the space between them—and the reals somehow fit in between that—we need to go look for new numbers where there currently aren't any.

We'll start with the classic real number line, marked at the integer positions, and poke around.
We imagine the line continues to the left and right indefinitely.

$$ \class{blue}{2} + \class{green}{3} = \class{red}{5} $$

But there's a problem with this visualization: by picturing numbers as points,
it's not clear how they act upon each other.
For example, the two adjacent numbers $ \class{blue}{2} + \class{green}{3} $ sum to $ \class{red}{5} $ …

$$ \class{blue}{2} + \class{green}{3} = \class{red}{5} $$

$$ \class{blue}{-2} + \class{green}{-1} = \class{red}{-3} $$

… but the similarly adjacent pair $ \class{blue}{-2} + \class{green}{-1} = \class{red}{-3} $.
We can't easily spot where the red point is going to be based on the blue and green.

A better solution is to represent our numbers using arrows instead, or vectors.
Each arrow represents a number through its length, pointing right/left for positive/negative.

The nice thing about arrows is that you can move them around without changing them.
To add two arrows, just lay them end to end. You can easily spot why $ \class{blue}{-2} + \class{green}{-1} = \class{red}{-3} $ …

… and why $ \class{blue}{2} + \class{green}{3} = \class{red}{5} $, similarly.
As long as we apply positives and negatives correctly, everything still works.

Now let's examine multiplication. We're going to start with $ \class{blue}{1} $ and then we'll multiply it by $ \class{green}{1.5} $ repeatedly.

$$ \times \class{green}{1.5} ... $$

With every multiplication, the vector gets longer by 50% percent.
These vectors represent the numbers $ \class{red}{1} $, $ \class{red}{1.5} $, $ \class{red}{2.25} $, $ \class{red}{3.375} $, $ \class{red}{5.0625} $, a nice exponential sequence.

Now we're going to do the same, but multiplying by the negative, $ \class{green}{-1.5} $, repeatedly.

$$ \times (\class{green}{-1.5}) ... $$

The vectors still grow by 50%, but they also flip around, alternating between positive and negative.
These vectors represent the sequence $ \class{red}{1} $, $ \class{red}{-1.5} $, $ \class{red}{2.25} $, $ \class{red}{-3.375} $, $ \class{red}{5.0625} $.

$$ \times (\class{green}{-1.5}) ... $$

But there's another way of looking at this. What if instead of flipping from positive to negative, passing through zero, we went around instead, by rotating the vector as we're growing it?

$$ \times (\class{green}{-1.5}) ... $$

We'd get the same numbers, but we've discovered something remarkable: a way to enter and pass through the netherworld around the number line. The question is, is this mathematically sound, or plain non-sense?

$$ \times (\class{green}{-1.5}) ... $$

$$ \pm180^\circ $$

$$ 0^\circ $$

The challenge is to come up with a consistent rule for applying these rotations. We start with normal arithmetic. Multiplying by a positive didn't flip the sign, so we say we rotated by $ 0^\circ $. Multiplying by a negative flips the sign, so we rotated by $ \class{green}{180^\circ} $. The lengths are multiplied normally in both cases.

$$ \times \class{green}{1.5 \angle 90^\circ} ... $$

$$ +180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ +270^\circ $$

Now suppose we pick one of the in-between nether-numbers, say the vector of length $ 1.5 $, at a $ 90^\circ $ angle. What does that mean? That's what we're trying to find out! We'll write that as $ \class{green}{1.5 \angle 90^\circ} $ (1.5 at 90). It could make sense to say that multiplying by this number should rotate by $ \class{green}{90^\circ} $ while again growing the length by 50%.

$$ \times \class{green}{1.5 \angle 90^\circ} ... $$

$$ +180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ +270^\circ $$

This creates the spiral of points: $ \class{red}{1 \angle 0^\circ} $, $ \class{red}{1.5 \angle 90^\circ} $, $ \class{red}{2.25 \angle 180^\circ} $, $ \class{red}{3.375 \angle 270^\circ} $, $ \class{red}{5.0625 \angle 360^\circ} $. Three of those are normal numbers: $ +1 $, $ -2.25 $ and $ +5.0625 $, lying neatly on the real number line. The other two are new numbers conjured up from the void.

$$ +180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ +270^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ +225^\circ $$

$$ +315^\circ $$

$$ \times \class{green}{1 \angle 45^\circ} ... $$

Let's examine this rotation more. We can pick $ 1 $ at a $ \class{green}{45^\circ} $ angle. Multiplying by a $ 1 $ probably shouldn't change a vector's length, which means we'd get a pure rotation effect.

$$ +180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ +270^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ +225^\circ $$

$$ +315^\circ $$

By multiplying by $ \class{green}{1 \angle 45^\circ} $, we can rotate in increments of $ 45^\circ $.
It takes 4 multiplications to go from $ +1 $, around the circle of ones, and back to the real number $ -1 $.

$$ +180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ +270^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ +225^\circ $$

$$ +315^\circ $$

And that's actually a remarkable thing, because it means our invented rule has created a square root of $ -1 $.
It's the number $ \class{green}{1 \angle 90^\circ} $.

$$ +180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ +270^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ +225^\circ $$

$$ +315^\circ $$

$ (\class{green}{1 \angle 90^\circ})^2 = \class{blue}{-1} $

If we multiply it by itself, we end up at angle $ \class{green}{90} + \class{green}{90} = \class{blue}{180^\circ} $, which is $ \class{blue}{-1} $ on the real line.

$$ +180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ +270^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ +225^\circ $$

$$ +315^\circ $$

But actually, the same goes for $ \class{green}{1 \angle 270^\circ} $.

$$ +180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ +270^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ +225^\circ $$

$$ +315^\circ $$

$ (\class{green}{1 \angle 270^\circ})^2 = \class{blue}{-1} $

When we multiply it by itself, we end up at angle $ \class{green}{270} + \class{green}{270} = \class{blue}{540^\circ} $. But because we went around the circle once, that's the same as rotating by $ \class{blue}{180^\circ} $. So that's also equal to $ \class{blue}{-1} $.

$$ \pm180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ -90^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ -135^\circ $$

$$ -45^\circ $$

$ (\class{green}{1 \angle -90^\circ})^2 = \class{blue}{-1} $

Or we could think of $ +270^\circ $ as $ -90^\circ $, and rotate the other way. It works out just the same. This is quite remarkable: our rule is consistent no matter how many times we've looped around the circle.

$$ \pm180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ -90^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ -135^\circ $$

$$ -45^\circ $$

$ (\class{green}{1 \angle 90^\circ})^2 = \class{blue}{-1} $

$ (\class{green}{1 \angle 270^\circ})^2 = \class{blue}{-1} $

Either way, $ \class{blue}{-1} $ has two square roots, separated by $ 180^\circ $, namely $ \class{green}{1 \angle 90^\circ} $ and $ \class{green}{1 \angle 270^\circ} $.
This is analogous to how both $ 2 $ and $ -2 $ are square roots of $ 4 $.

$$ \pm180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ -90^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ -135^\circ $$

$$ -45^\circ $$

$$ \class{blue}{a} \cdot \class{green}{b} = \class{red}{c}$$

Complex multiplication can then be summarized as: angles add up, lengths multiply, taking care to preserve clockwise and counterwise angles. Above, we multiply two random complex numbers a and b to get c..

$$ \pm180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ -90^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ -135^\circ $$

$$ -45^\circ $$

$$ \class{blue}{a} \cdot \class{green}{b} = \class{red}{c}$$

When we start changing the vectors, c turns along, being tugged by both a and b's angles, wrapping around the circle, while its length changes. Hence, complex numbers like to turn, and it's this rule that separates them from ordinary vectors.

$$ \pm180^\circ $$

$$ 0^\circ $$

$$ \hspace{35 pt} + $$

$$ - \hspace{35 pt} $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ -135^\circ $$

$$ -45^\circ $$

We can then picture the complex plane as a grid of concentric circles. There's a circle of ones, a circle of twos, a circle of one-and-a-halfs, etc. Each number comes in many different versions or flavors, one positive, one negative, and infinitely many others in between, at arbitrary angles on both sides of the circle.

$$ \pm180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ \hspace{15pt} \class{blue}{i} $$

Which brings us to our reluctant and elusive friend, $ \class{blue}{i} $. This is the proper name for $ \class{blue}{1 \angle 90^\circ} $, and the way complex numbers are normally introduced: $ i^2 = -1 $. The magic is that we can put a complex number anywhere a real number goes, and the math still works out, oddly enough. We get complex answers about complex inputs.

$$ \pm180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

Complex numbers are then usually written as the sum of their (real) X coordinate, and their (imaginary) Y coordinate, much like ordinary 2D vectors. But this is misleading: the ugly number $ \class{red}{\frac{\sqrt{3}}{2} + \frac{1}{2}i } $ is actually just $ \class{green}{1 \angle 30^\circ} $ in disguise, and it acts more like a $ 1 $ than a $ \frac{1}{2} $ or $ \frac{\sqrt{3}}{2} $. While knowing how to convert between the two is required for any real calculations, you can cheat by doing it visually.

$$ \pm180^\circ $$

$$ 0^\circ $$

$$ +90^\circ $$

$$ -90^\circ $$

$$ +135^\circ $$

$$ +45^\circ $$

$$ -135^\circ $$

$$ -45^\circ $$

But looking at individual vectors only gets us so far. We study functions of real numbers by looking at a graph that shows us every output for every input. To do the same for complex numbers, we need to understand how these numbers-that-like-to-turn, this field of vectors, change as a whole.
Note: from now on, I'll put 0º at the 12 o'clock position for simplicity.

When we apply a square root, each vector shifts. But really, it's the entire fabric of the complex plane that's warping. Each circle has been squeezed into a half-circle, because all the angles have been halved—the opposite of squaring, i.e. doubling the angle. The lengths have had a normal square root applied to them, compressing the grid at the edges and bulging it in the middle.

But remember how every number had two opposite square roots? This comes from the circular nature of complex math. If we take a vector and rotate it $ 360 ^\circ $, we end up in the same place, and the two vectors are equal. But after dividing the angles in half, those two vectors are now separated by only $ 180 ^\circ $ and lie on opposite ends of the circle. In complex math, they can both emerge.

Complex operations are then like folding or unfolding a piece of paper, only it's weird and stretchy and circular. This can be hard to grasp, but is easier to see in motion. To help see what's going on, I've cut the disc and separated the positive from the negative angles in 3D.

When we square our numbers to undo the square root, the angles double, folding the plane in on itself. The lengths are also squared, restoring the grid spacing to normal.

After squaring, each square root has now ended up on top of its identical twin, and we can merge everything back down to a flat plane. Everything matches up perfectly.

Thus the square root actually looks like this. New numbers flow in from the 'far side' as we try and shear the disc apart. The complex plane is stubborn and wants to stay connected, and will fold and unfold to ensure this is always the case. This is one of its most remarkable properties.

There's no limit to this folding or unfolding. If we take every number to the fourth power, angles are multiplied by four, while lengths are taken to the fourth power. This results in 4 copies of the plane being folded into one.

However, things are not always so neat. What happens if we were to take everything to an irrational power, say $ \frac{1}{\sqrt{2}} $? Angles get multiplied by $ 0.707106... $, which means a rotation of $ 360^\circ $ now becomes $ \sim 254.56^\circ $.

Because no multiple of $ 360 $ is divisible by $ \frac{1}{\sqrt{2}} $, the circular grid never matches up with itself again no matter how far we extend it. Hence, this operation splits a single unique complex number into an infinite amount of distinct copies.

For any irrational power $ p $, there are an infinite number of solutions to $ z^p = c $, all lying on a circle. For a hint as to why this is so, we can look at Taylor series: an arbitrary function $ f(z) $ can be written as an infinite series $ a + bz + cz^2 + dz^3 + ... \,$ When z is complex, such a sum doesn't just represent a finite amount of folds, but a mindboggling infinite origami of complex space.

We've seen how complex numbers are arrows that like to turn, which can be made to behave like numbers: we can add and multiply them, because we can come up with a consistent rule for doing so. We've also seen what powers of complex numbers look like: we fold or unfold the entire plane by multiplying or dividing angles, while simultaneously applying a power to the lengths.

Pulling a Dragon out of a Hat

With a basic grasp of what complex numbers are and how they move, we can start making Julia fractals.

At their heart lies the following function:

$$ f(z) = z^2 + c $$

This says: map the complex number $ z $ onto its square, and then add a constant number to it. To generate a Julia fractal, we have to apply this formula repeatedly, feeding the result back into $ f $ every time.

$$ z_{n+1} = (z_n)^2 + c $$

We want to examine how $ z_n $ changes when we plug in different starting values for $ z_1 $ and iterate $ n $ times. So let's try that and see what happens.

Our region of interest is the disc of complex numbers less than $ 2 $ in length. I've marked the circle of ones as a reference.

We take an arbitrary set of numbers, like this grid, and start applying the formula $ f(z) = z^2 + c $ to each. Rather than use vectors, I'll just draw points, to avoid cluttering the diagram.

First we square each number. That is, their lengths are squared, their angles are doubled.
The squaring has a dual effect: numbers larger than $ 1 $ grow bigger and are pushed outwards, numbers less than $ 1 $ grow smaller and are pulled inwards.

Next, we reset the grid back to neutral, keeping the numbers in their new place.
We also pick a random value for the constant $ \class{green}{c} $, e.g. $ \class{green}{0.57 \angle 59^\circ} $.

Now we add $ \class{green}{c} $ to each point, completing one round of Julia iteration, $ f(z) = z^2 + c $. As a result, some numbers have ended up closer towards the origin (i.e. $ 0 $), others further away from it. The combination of folding + shifting has had a non-obvious effect on the numbers.

We begin the second iteration and square each number again. Any number not inside the critical circle of $ 1 $ in the middle will get pushed out again. The other numbers continue to linger in the middle.

If we zoom out, we can see the larger numbers are spiralling outwards and are permanently lost. The minor nudge by $ \class{green}{c} $ won't be enough to bring them back.

Others remain in the middle, being drawn in, but are also at risk of being pushed out of the circle by $ \class{green}{c} $.

Resetting the grid again, we add the same value $ \class{green}{c} $ to our vectors again to finish. At this point, our original grid of numbers has been completely jumbled up.

If we continued this process would any numbers remain in the middle? Or would they eventually all get flung out? Unfortunately it's very hard to see what's going on while iterating forwards, because we lose track of where each point came from.

So we're going to go backwards instead. We'll establish a safe-zone of all numbers less than $ 2 $, forming a solid disc of all the ones which aren't irretrievably lost. We want to know where all these numbers can possibly come from.

First we have to shift the numbers again, this time in the opposite direction to subtract $ c $.

Now we apply the square root to find $ z_{n-1} = \pm \sqrt{z_n - c} $, which is a Julia iteration in reverse.

After one backwards iteration, the disc has been squished down into an oval at an angle.
These are all the points that will definitely stay in the middle after one iteration.

When we apply the second iteration, a pattern starts to develop. Because of the repeated unfolding, we create two bulges wherever there was previously only one.

At the same time, the square root alters the length of each number as well. As a result, we squeeze in the radial direction, scaling down earlier features as they combine with newly created ones.

After 4 iterations, we start to see the first hints of self-similarity. The shape's lobes are sprouting into spirals.

But all we've really done is narrow down our blue safe-zone to include only those points that 'survive' up to 5 Julia iterations.

Remarkably this seems to distort the fractal evenly. This is not a coincidence. Complex operations are indeed stubborn, in that they all preserve right angles everywhere. To do so, the mapping must act like a pure scaling and rotation at every point, without shearing off in any particular direction. This is what allows the fractal to look like itself at different scales.

Skipping ahead to iteration 12, we've now most definitely abandoned the realm of neat, traditional geometry.
Despite curving wildly, the total mapping $ z_{12} $ still has this property of evenness, which is properly referred to as a conformal mapping.

After 128 iterations, we end up with this intricate dragon-like shape, approximating the safe zone for the true fractal map $ z_\infty $. The numbers that make up the blue area are the hardiest points that will survive the next 127 attempts on their life. All the others will definitely get flung out.

Yet this complicated shape is merely the result of folding over and over again, adding a simple constant in between. If we perform a forwards Julia iteration, i.e. squaring and shifting, we see this shape matches up with itself, and looks identical before and after.

For different values of $ c $, the fractal morphs into other shapes. There's literally an infinite variety to discover. Some sets are made up of disconnected parts. In this case, $ |c| $ is large enough to push the solid disc away from the center in a single iteration, but not so far that some points can't fold back in. If $ |c| $ gets much larger, the set vanishes.

For a smaller $ c $, Julia sets are solid. Even a small shift in the value of $ c $ can accumulate into a large difference. Here we zone in on some fluffy clouds right outside the 'solid' zone.

This area of fractal space is dubbed Seahorse Valley, for rather obvious reasons.

And we can even make snowflakes. The dramatic changes due to $ c $ reveal the chaotic nature of fractals. Mathematically, chaos occurs when even the tiniest change can accumulate and blow up to an arbitrarily large effect.

If we change our iteration formula, for example to a fourth power $ f(z) = z^4 + c $, the entire shape changes. Because each iteration now turns one bulge into four, the resulting shape has four-fold rotational symmetry.

Again, different values of $ c $ make different shapes, precipitating dramatic changes.

Finally, there's the Mandelbrot set. This is similar to a Julia set, but the formula is applied differently. Effectively, we swap $ c $ and the initial value $ z_1 $ and set $ z_1 $ to 0. Hence, $ c $ is no longer constant, and the mapping stops being a simple folding operation. Each iteration is now unique (and not so easy to visualize).

Because the Mandelbrot set traverses all possible values of $ c $ across its surface, it has a part of every associated Julia set in it. Around any number $ c $ it looks like the Julia set which has that value as its constant.

However, the Mandelbrot set also contains copies of itself, buried inside its edge. As a result, deep Mandelbrot zooms can reach astonishing levels of beauty in complexity. This is best done with specialized software that can calculate with hundreds of digits of precision.

Making fractals is probably the least useful application of complex math, but it's an undeniably fascinating one. It also reveals the unique properties of complex operations, like conformal mapping, which provide a certain rigidity to the result.

However, in order to make complex math practical, we have to figure out how to tie it back to the real world.

Travelling without Moving

It's a good thing we don't have to look far to do so. Whenever we're describing wavelike phenomena, whether it's sound, electricity or subatomic particles, we're also interested in how the wave evolves and changes. Complex operations are eminently suited for this, because they naturally take place on circles. Numbers that oppose can cancel out, numbers in the same direction will amplify each other, just like two waves do when they meet. And by folding or unfolding, we can alter the frequency of a pattern, doubling it, halving it, or anything in between.

More complicated operations are used for example to model electromagnetic waves, whether they are FM radio, wifi packets or ADSL streams. This requires precise control of the frequencies you're generating and receiving. Doing it without complex numbers, well, it just sucks. So why use boring real numbers, when complex numbers can do the work for you?

$$ w(x) = sin(x) $$

Take for example a sine wave $ w(x) $.

$$ w(x, t) = sin(x - t) $$ $$ \class{blue}{\frac{\partial w(x, t)}{\partial t}} $$

For the wave to propagate across a distance, its values have to ripple up and down over time.
The rate of change over time is drawn on top. This is the vertical velocity at every point. Both the wave and its rates of change undergo a complicated numerical dance.

$$ w(x, t) = sin(x - t) $$ $$ \class{blue}{\frac{\partial w(x, t)}{\partial t}} \,\, \class{green}{\frac{\partial w^2(x, t)}{\partial t^2}} $$

But to properly describe this motion, we have to go one level deeper. We have to examine the rate of change of the vertical velocity of the wave. This is its vertical acceleration. We see that green vectors tug on blue vectors as blue vectors tug on the wave.

$$ w(x, t) = sin(x - t) $$ $$ \class{green}{\frac{\partial w^2(x, t)}{\partial t^2}} = \,? $$

It's easier to see what's going on if we center the vectors vertically. The acceleration appears to be equal but opposite to the wave itself.

$$ w(x, t) = sin(x - t) + 1 $$ $$ \class{green}{\frac{\partial w^2(x, t)}{\partial t^2}} = \,? $$

But that's just a lucky coincidence. If we shift the wave up by one unit, its opposite shifts down by a unit. Yet its velocity and acceleration are unaltered. So acceleration is not simply the opposite of the wave.

$$ w(x, t) = sin(x - t) + 1 $$ $$ \class{green}{\frac{\partial w^2(x, t)}{\partial t^2}} = \,? $$

What's actually going is that the green vectors match the curvature of the wave, positive inside valleys, negative on top of crests. Intuitively, this can be explained by saying that waves tend to bounce towards an average level: this is going to pull the value up out of valleys and down from peaks.

$$ w(x, t) = sin(x - t) $$ $$ \class{green}{\frac{\partial w^2(x, t)}{\partial t^2}} = \class{red}{\frac{\partial w^2(x, t)}{\partial x^2}} $$

But curvature is the rate of change of the slope, and slope is the rate of change over a distance. So to describe real waves, we need to relate 'second level' change over time and change over distance, each deriving twice. This is Complicated with a capital C.

Let's try this with complex numbers instead. Until now, we had a 2D graph, showing the real value of the wave over real distance. We're going to make the wave's value complex. Mapping a 1D number (distance) to a 2D number (the wave function), means we need a 3D diagram.

The complex plane is mapped into the old Y direction (real) and the new Z direction (imaginary).

$$ w(x) = (1 \angle x) $$

To make a complex wave, we do the thing complex numbers are best at: we make them turn, and make a helix. In this case, our wave function is simply the variable number $ 1 \angle x $ , a constant length with a smoothly changing rotation over distance.

$$ w(x, t) = (1 \angle x) \cdot (1 \angle t) = 1 \angle (x + t) $$ $$ \class{blue}{\frac{\partial w(x, t)}{\partial t}} = \,? $$

To make the wave move, we can simply twist it in-place. Which we now know is the same as multiplying by $ 1 \angle t $. If we plot the complex velocity of each point, at first sight this might not look any simpler than the real wave. But in fact, these vectors are not changing in length at all, unlike the real version. Just like the wave they pull on, they undergo a pure rotation.

$$ w(x, t) = 1 \angle (x + t) $$ $$ \class{blue}{\frac{\partial w(x, t)}{\partial t}} = i \cdot w(x, t) $$

At all times, the velocity is offset by $ 90^\circ $ from the wave itself. And that means that described in complex numbers, wave equations are super easy. Instead of involving two derivatives, i.e. the rate of rate of change, we only need one. There is a direct relationship between a value and its rate of change. The necessary rotation by $ 90^\circ $ can then be written simply as multiplying by $ i $.

To recover a real wave from a complex wave, we can simply flatten it back to 2D, discarding the imaginary part. By using complex numbers to describe waves, we give them the power to rotate in place without changing their amplitude, which turns out to be much simpler.

$$ \frac{1}{2} \cdot ( \class{blue}{ 1 \angle (x + t) } + \class{green}{ 1 \angle (-x - t) } ) = cos(x + t)$$

In fact, flattening the wave has a perfectly reasonable complex interpretation: it's what happens when we average out a counter-clockwise wave (positive frequency) with a clockwise wave (negative frequency). By twisting each in opposite directions, the combined wave travels along, locked to the real number line.

But if we add up two arbitrary complex frequencies, their sum immediately turns into a spirograph pattern that manages to evolve and propagate, even as it just rotates in place. Though the original waves both had a constant amplitude of $ 1 $, the relative differences in angles (i.e. the phase) allows them to cancel out in surprising ways.

Neither curve is actually moving forward: they're just spinning in place, creating motion anyway. This is actually what quantum superposition looks like, where two or more complex probability waves combine and interfere. Where the combined wave cancels out to zero, that's where two separate possible states are simultaneously cancelling out each other. The fact that the underlying numbers are complex doesn't prevent them from describing real physics, indeed, it seems that's how nature actually works.

The End Is Just The Beginning

In visualizing complex waves, we've seen functions that map real numbers to complex numbers, and back again. These can be graphed easily in 3D diagrams, from $ \mathbb{R} $ to $ \mathbb{C} $ or vice-versa. You cross 1 real dimension with the 2 dimensions of the complex plane.

But complex operations in general work from $ \mathbb{C} $ to $ \mathbb{C} $. To view these, unfortunately you need four-dimensional eyes, which nature has yet to provide. There are ways to project these graphs down to 3D that still somewhat make sense, but it never stops being a challenge to interpret them.

For every mathematical concept that we have a built-in intuition for, there are countless more we can't picture easily. That's the curse of mathematics, yet at the same time, also its charm.

Hence, I tried to stick to the stuff that is (somewhat!) easy to picture. If there's interest, a future post could cover topics like: the nature of $ e^{ix} $, Fourier transforms, some actual quantum mechanics, etc.

For now, this story is over. I hope I managed to spark some light bulbs here and there, and that you enjoyed reading it as much as I did making it.

Comments, feedback and corrections are welcome on Google Plus.

For extra credit: check out these great stirring visualizations of Julia and Mandelbrot sets. I incorporated a similar graphic above. Hat tip to Tim Hutton for pointing these out.

And for some actual paper mathematical origami, check out Vihart's latest video on Snowflakes, Starflakes and Swirlflakes.

↧

To Infinity… And Beyond!

January 28, 2013, 12:00 am

≫ Next: On WebGL

≪ Previous: How to Fold a Julia Fractal

To Infinity… And Beyond!

Exploring the outer limits

“It is known that there are an infinite number of worlds, simply because there is an infinite amount of space for them to be in. However, not every one of them is inhabited. Therefore, there must be a finite number of inhabited worlds.

Any finite number divided by infinity is as near to nothing as makes no odds, so the average population of all the planets in the universe can be said to be zero. From this it follows that the population of the whole universe is also zero, and that any people you may meet from time to time are merely the products of a deranged imagination.”
– The Restaurant at the End of the Universe, Douglas Adams

If there's one thing mathematicians have a love-hate relationship with, it has to be infinity. It's the ultimate tease: it beckons us to come closer, but never allows us anywhere near it. No matter how far we travel to impress it, infinity remains disinterested, equally distant from everything: infinitely far!

$$ 0 < 1 < 2 < 3 < … < \infty $$

Yet infinity is not just desirable, it is absolutely necessary. All over mathematics, we find problems for which no finite amount of steps will help resolve them. Without infinity, we wouldn't have real numbers, for starters. That's a problem: our circles aren't round anymore (no $ π $ and $ \tau $) and our exponentials stop growing right (no $ e $). We can throw out all of our triangles too: most of their sides have exploded.

A steel railroad bridge with a 1200 ton counter-weight.
Completed in 1910. Source: Library of Congress.

We like infinity because it helps avoid all that. In fact even when things are not infinite, we often prefer to pretend they are—we do geometry in infinitely big planes, because then we don't have to care about where the edges are.

Now, suppose we want to analyze a steel beam, because we're trying to figure out if our proposed bridge will stay up. If we want to model reality accurately, that means simulating each individual particle, every atom in the beam. Each has its own place and pushes and pulls on others nearby.

But even just $ 40 $ grams of pure iron contains $ 4.31 \cdot 10^{23} $ atoms. That's an inordinate amount of things to keep track of for just 1 teaspoon of iron.

Instead, we pretend the steel is solid throughout. Rather than being composed of atoms with gaps in between, it's made of some unknown, filled in material with a certain density, expressed e.g. as grams per cubic centimetre. Given any shape, we can determine its volume, and hence its total mass, and go from there. That's much simpler than counting and keeping track of individual atoms, right?

Unfortunately, that's not quite true.

The Shortest Disappearing Trick Ever

Like all choices in mathematics, this one has consequences we cannot avoid. Our beam's density is mass per volume. Individual points in space have zero volume. That would mean that at any given point inside the beam, the amount of mass there is $ 0 $. How can a beam that is entirely composed of nothing be solid and have a non-zero mass?

Bam! No more iron anywhere.

While Douglas Adams was being deliberately obtuse, there's a kernel of truth there, which is a genuine paradox: what exactly is the mass of every atom in our situation?

To make our beam solid and continuous, we had to shrink every atom down to an infinitely small point. To compensate, we had to create infinitely many of them. Dividing the finite mass of the beam between an infinite amount of atoms should result in $ 0 $ mass per atom. Yet all these masses still have to add up to the total mass of the beam. This suggests $ 0 + 0 + 0 + … > 0 $, which seems impossible.

If the mass of every atom were not $ 0 $, and we have infinitely many points inside the beam, then the total mass is infinity times the atomic mass $ m $. Yet the total mass is finite. This suggests $ m + m + m + … < \infty $, which also doesn't seem right.

It seems whatever this number $ m $ is, it can't be $ 0 $ and can't be non-zero. It's definitely not infinite, we only had a finite mass to begin with. It's starting to sound like we'll have to invent a whole new set of numbers again to even find it.

That's effectively what Isaac Newton and Gottfried Leibniz set in motion at the end of the 17th century, when they both discovered calculus independently. It was without a doubt the most important discovery in mathematics and resulted in formal solutions to many problems that were previously unsolvable— our entire understanding of physics has relied on it since. Yet it took until the late 19th century for the works of Augustin Cauchy and Karl Weierstrass to pop up, which formalized the required theory of convergence. This allows us to describe exactly how differences can shrink down to nothing as you approach infinity. Even that wasn't enough: it was only in the 1960s when the idea of infinitesimals as fully functioning numbers—the hyperreal numbers—was finally proven to be consistent enough by Abraham Robinson.

But it goes back much further. Ancient mathematicians were aware of problems of infinity, and used many ingenious ways to approach it. For example, $ π $ was found by considering circles to be infinite-sided polygons. Archimedes' work is likely the earliest use of indivisibles, using them to imagine tiny mechanical levers and find a shape's center of mass. He's better known for running naked through the streets shouting Eureka! though.

That it took so long shows that this is not an easy problem. The proofs involved are elaborate and meticulous, all the way back. They have to be, in order to nail down something as tricky as infinity. As a result, students generally learn calculus through the simplified methods of Newton and Leibniz, rather than the most mathematically correct interpretation. We're taught to mix notations from 4 different centuries together, and everyone's just supposed to connect the dots on their own. Except the trail of important questions along the way is now overgrown with jungle.

Still, it shows that even if we don't understand the whole picture, we can get a lot done. This article is in no way a formal introduction to infinitesimals. Rather, it's a demonstration of why we might need them.

What is happening when we shrink atoms down to points? Why does it make shapes solid yet seemingly hollow? Is it ever meaningful to write $ x = \infty $? Is there only one infinity, or are there many different kinds?

To answer that, we first have to go back to even simpler times, to Ancient Greece, and start with the works of Zeno.

Achilles and the Tortoise

Zeno of Elea was one of the first mathematicians to pose these sorts of questions, effectively trolling mathematics for the next two millennia. He lived in the 5th century BC in southern Italy, although only second-hand references survive. In his series of paradoxes, he examines the nature of equality, distance, continuity, of time itself.

Because it's the ancient times, our mathematical knowledge is limited. We know about zero, but we're still struggling with the idea of nothing. We've run into negative numbers, but they're clearly absurd and imaginary, unlike the positive numbers we find in geometry. We also know about fractions and ratios, but square roots still confuse us, even though our temples stay up.

So the story goes: the tortoise challenges Achilles to a footrace.

"If you give me a head start," it says, "any start at all, you can never win.".
Achilles laughs and decides to be a good sport: he'll only run twice as fast as the tortoise.

The tortoise explains: "If you want to pass me, first you have to move to where I am. By the time you get there, I'll have walked ahead a little bit."

"While you cross the next distance, I will move yet again. No matter how many times you try to catch up, I'll always be some small distance ahead. Therefor, you cannot beat me."

Achilles realizes that talking tortoises are not a sign of positive mental health, so he decides to find a wall to run into instead. It will either confirm the theory, or end the pain.

See, the race is actually unnecessary, because the problem remains the same.
In order to reach the wall, Achilles first has to cross half the way there.

Then he has to go half that distance again, and again. No matter how many times he repeats this, there will always be some distance left. So if Achilles can't cross this distance in a finite amount of steps, why is he wearing that stupid helmet?

$$ … $$

The distance travelled forms a never ending sequence of expanding sums.
We have to examine the entire sequence, rather than individual numbers in it.

By definition, the distance travelled and distance to the wall always add up to $ 1 $. So one simple way to resolve this conundrum is to say: Well yes, it's going to take you infinitely long to glue all those pieces together, but only because you already spent an infinite amount of time chopping them up!
But that's not a very mathematically satisfying answer. Let's try something else.

The distance to the wall is always equal to the last step taken. We know that each step is half as long as the previous one, starting with $ \frac{1}{2} $. Therefor, the distance to the wall must decrease exponentially: $ \frac{1}{2}, \frac{1}{4}, \frac{1}{8}, \frac{1}{16}, … $, getting closer to zero with every step.

But why can we say that this gap effectively closes to zero after 'infinity steps'? The number that we're building up is $ \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + … \,$

We know our sum will never exceed $ 1 $, as there is only $ 1 $ unit of distance being divided. This means $ \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + … \leq 1 $, which eliminates every number past the surface of the wall—but not the surface itself.

Suppose we presume $ \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + … < 1 $ and hence that this number lies some tiny distance in front of the wall.

Well in that case, all we need to do is zoom in far enough, and we'll see our sequence jump past it after a certain finite number of steps.

If we try to move it closer to the wall, the same thing happens. This number simply cannot be less than $ 1 $. Therefor $ \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + … \geq 1 $

The only place $ \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + … \, $ can be is exactly $ 0 $ units away from $ 1 $. If two numbers have zero distance between them, then they are equal.

$$ … $$

What we've actually done here is applied the principle of limits: we've defined a procedure of steps that lets us narrow down the interval where the infinite sum might be. The lower bound is the sequence of sums itself: it only increases towards $ 1 $, never decreases. For the upper bound, we established no sum could exceed $ 1 $. Therefor the interval must shrink to nothing, and the sequence converges.

$$ \lim_{n \to +\infty} x_n = \mathop{\class{no-outline}{►\hspace{-2pt}►}}_{\infty\hspace{2pt}} x_n $$

The purpose of a limit is then to act as a supercharged fast-forward button. It lets us avoid the infinite amount of work required to complete sums like $ \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + … $ and simply skip to the end. To do so, we have to step back, spot the pattern, and pin down where it ends. So limits allow us to literally reach the unreachable. But in fact, you already knew that.

$$ \frac{2}{3} = 0.66666… $$

$$ 0.6 + 0.06 + 0.006 + …\hspace{2pt} $$

As soon as you learned to divide, you found $ 2 \div 3 = 0.666… = 0.6 + 0.06 + 0.006 + …\hspace{2pt} $
Even in primary school the opportunity to examine infinity is there. Rather than tackle it head on, it's simply noted and filed. Eight years later it's regurgitated in the form of cryptic epsilon-delta definitions.

$$ 1 - 1 + 1 - 1 + 1 … $$

But then there's those pesky consequences again. By allowing the idea of infinity, we can invent an entire zoo of paradoxical things. For example, imagine a lamp that's switched on ($1$) and off ($0$) at intervals that decrease by a factor of two: on for $ \frac{1}{2} $ second, off for $ \frac{1}{4} s $, on for $ \frac{1}{8} s $, off for $ \frac{1}{16} s $, …
After $ 1\,s $, when the switch has been flipped an infinite amount of times, is the lamp on or off?

$$ (1 - 1) + (1 - 1) + (1 - 1) + … = 0 \,? $$

$$ 1 + (-1 + 1) + (-1 + 1) + … = 1 \,? $$

Another way to put this is that the lamp's state at $ 1\,s $ is the result of the infinite sum $ 1 - 1 + 1 - 1 + … $
Intuitively we might say each pair of $ +1 $ and $ -1 $ should cancel out and make the entire sum equal to $ 0 $.
But we can pair them the other way, leading to $ 1 $ instead. It can't be both.

If we zoom in, it's obvious that no matter how close we get to $ 1\,s $, the lamp's state keeps switching. Therefor it's meaningless to attempt to 'fast forward' to the end, and the limit does not exist. At $ 1\,s $ the lamp is neither on nor off: it's undefined. This infinite sum does not converge.

But actually, we overcomplicated things. Thanks to the power of limits, we can ask a simpler, equivalent question. Given a lamp that switches on and off every second, what is its state at infinity? The answer's the same: it never settles.

Limits are the first tool in our belt for tackling infinity. Given a sequence described by countable steps, we can attempt to extend it not just to the end of the world, but literally forever. If this works we end up with a finite value. If not, the limit is undefined. A limit can be equal to $ \infty $, but that's just shorthand for the sequence has no upper bound. Negative infinity means no lower bound.

Breaking Away From Rationality

Until now we've only encountered fractions, that is, rational numbers. Each of our sums was made of fractions. The limit, if it existed, was also a rational number. We don't know whether this was just a coincidence.

It might seem implausible that a sequence of numbers that is 100% rational and converges, can approach a limit that isn't rational at all. Yet we've already seen similar discrepancies. In our first sequence, every partial sum was less than $ 1 $. Meanwhile the limit of the sum was equal to $ 1 $. Clearly, the limit does not have to share all the properties of its originating sequence.

We also haven't solved our original problem: we've only chopped things up into infinitely many finite pieces. How do we get to infinitely small pieces? To answer that, we need to go looking for continuity.

Generally, continuity is defined by what it is and what its properties are: a noticeable lack of holes, and no paradoxical values. But that's putting the cart before the horse. First, we have to show which holes we're trying to plug.

Let's imagine the rational numbers.

Actually, hold on. Is this really a line? The integers certainly weren't connected.

Rather than assume anything, we're going to attempt to visualize all the rational numbers. We'll start with the numbers between $ 0 $ and $ 1 $.

$$ \class{blue}{\frac{0 + 1}{2}} $$

Between any two numbers, we can find a new number in between: their average. This leads to $ \frac{1}{2} $.

$$ \frac{a + b}{2} $$

By repeatedly taking averages, we keep finding new numbers, filling up the interval.

If we separate out every step, we get a binary tree.

You can think of this as a map of all the fractions of $ 2^n $. Given any such fraction, say $ \frac{13}{32} = \frac{13}{2^5} $, there is a unique path of lefts and rights that leads directly to it. At least, as long as it lies between $ 0 $ and $ 1 $.

Note that the graph resembles a fractal and that the distance to the top edge is divided in half with every step. But we only ever explore a finite amount of steps. Therefor, we are not taking a limit and we'll never actually touch the edge.

$$ \frac{2 \cdot a + b}{3} $$

$$ \frac{a + 2 \cdot b}{3} $$

But we can take thirds as well, leading to fractions with a power of $ 3^n $ in their denominator.

As some numbers can be reached in multiple ways, we can eliminate some lines, and end up with this graph, where every number sprouts into a three-way, ternary tree. Again, we have a map that gives us a unique path to any fraction of $ 3^n $ in this range, like $ \frac{11}{27} = \frac{11}{3^3} $.

$$ \frac{21}{60} = \frac{21}{2^2 \cdot 3 \cdot 5} $$

Because we can do this for any denominator, we can define a way to get to any rational number in a finite amount of steps. Take for example $ \frac{21}{60} $. We decompose its denominator into prime numbers and begin with $ 0 $ and $ 1 $ again.

$$ \frac{21}{60} = \frac{21}{2^2 \cdot 3 \cdot 5} $$

There is a division of $ 2^2 $, so we do two binary splits. This time, I'm repeating the previously found numbers so you can see the regular divisions more clearly. We get quarters.

The next factor is $ 3 $ so we divide into thirds once. We now have twelfths.

For the last division we chop into fifths and get sixtieths.

$ \frac{21}{60} $ is now the 21st number from the left.

But this means we've found a clear way to visualize all the rational numbers between $ 0 $ and $ 1 $: it's all the numbers we can reach by applying a finite number of binary (2), ternary (3), quinary (5) etc. divisions, for any denominator. So there's always a finite gap between any two rational numbers, even though there are infinitely many of them.

The rational numbers are not continuous. Therefor, it is more accurate to picture them as a set of tick marks than a connected number line.

To find continuity then, we need to revisit one of our earlier trees. We'll pick the binary one.
While every fork goes two ways, we actually have a third choice at every step: we can choose to stop. That's how we get a finite path to a whole fraction of $ 2^n $.

But what if we never stop? We have to apply a limit: we try to spot a pattern and try to fast-forward it. Note that by halving each step vertically on the graph, we've actually linearized each approach into a straight line which ends. Now we can take limits visually just by intersecting lines with the top edge.

Right away we can spot two convergent limits: by always choosing either the left or the right branch, we end up at respectively $ 0 $ and $ 1 $.

These two sequences both converge to $ \frac{1}{2} $. It seems that 'at infinity steps', the graph meets up with itself in the middle.

But the graph is now a true fractal. So the same convergence can be found here. In fact, the graph meets up with itself anywhere there is a multiple of $ \frac{1}{2^n} $.

That's pretty neat: now we can eliminate the option of stopping altogether. Instead of ending at $ \frac{5}{16} $, we can simply take one additional step in either direction, followed by infinitely many opposite steps. Now we're only considering paths that are infinitely long.

But if this graph only leads to fractions of $ 2^n $, then there must be gaps between them. In the limit, the distance between any two adjacent numbers in the graph shrinks down to exactly $ 0 $, which suggests there are no gaps. This infinite version of the binary tree must lead to a lot more numbers than we might think.
Suppose we take a path of alternating left and right steps, and extend it forever. Where do we end up?

We can apply the same principle of an upper and lower bound, but now we're approaching from both sides at once. Thanks to our linearization trick, the entire sequence fits snugly inside a triangle.

If we zoom into the convergence at infinity, we actually end up at $ \class{orangered}{\frac{2}{3}} $.
Somehow we've managed to coax a fraction of $ 3 $ out of a perfectly regular binary tree.

If we alternate two lefts with one right, we can end up at $ \class{orangered}{\frac{4}{7}} $. This is remarkable: when we tried to visualize all the rational numbers by combining all kinds of divisions, we were overthinking it. We only needed to take binary divisions and repeat them infinitely with a limit.

Every single rational number can then be found by taking a finite amount of steps to get to a certain point, and then settling into a repeating pattern of lefts and/or rights all the way to infinity.

If we can find numbers between $ 0 $ and $ 1 $ this way, we can apply the exact same principle to the range $ 1 $ to $ 2 $. So we can connect two of these graphs into a single graph with its tip at $ 1 $.

But we can repeat it as much as we like. The full graph is not just infinitely divided, but infinitely big, in that no finite box can contain it. That means it leads to every single positive rational number. We can start anywhere we like. Is your mind blown yet?

No? Ok. But if this works for positives, we can build a similar graph for the negatives just by mirroring it. So we now have a map of the entire rational number set. All we need to do is take infinite paths that settle into a repeating pattern from either a positive or a negative starting point. When we do, we find every such path leads to a rational number.
So any rational number can be found by taking an infinite stroll on one of two infinite binary trees.

Wait, did I say two infinite trees? Sorry, I meant one infinitely big tree.
See, if we repeatedly scale up a fractal binary tree and apply a limit to that, we end up with almost exactly the same thing. Only this time, the two downward diagonals always eventually fold back towards $ 0 $. This creates a path of infinity + 1 steps downward. While that might not be very practical, it suggests you can ride out to the restaurant at the end of the universe, have dinner, and take a single step to get back home.

Is it math, or visual poetry? It's time to bring this fellatio of the mind to its inevitable climax.

$ \class{blue}{0} $

$ \class{green}{1} $

$ \class{blue}{0} $

$ \class{green}{1} $

$ \class{blue}{0} $

$ \class{green}{1} $

You may wonder, if this map is so amazing, how did we ever do without?
Let's label our branches. If we go left, we call it $ 0 $. If we go right, we call it $ 1 $.

$$ \frac{5}{3} = \class{green}{11}\class{blue}{0}\hspace{2pt}\class{green}{1}\class{blue}{0}\hspace{2pt}\class{green}{1}\class{blue}{0}… $$

We can then identify any number by writing out the infinite path that leads there as a sequence of ones and zeroes—bits.

But you already knew that.

$$ \frac{5}{3} = \class{green}{1}.\class{green}{1}\class{blue}{0}\hspace{2pt}\class{green}{1}\class{blue}{0}\hspace{2pt}\class{green}{1}\class{blue}{0}…_2 $$

See we've just rediscovered the binary number system. We're so used to numbers in decimal, base 10, we didn't notice. Yet we all learned that rational numbers consist of digits that settle into a repeating sequence, a repeating pattern of turns. Disallowing finite paths works the same, even in decimal: the number $ 0.95 $ can be written as $\, 0.94999…\, $, i.e. take one final step in one direction, followed by infinitely many steps the other way.

$$ \frac{4}{5} = \class{blue}{0}.\class{green}{11}\class{blue}{00}\hspace{2pt}\class{green}{11}\class{blue}{00}…_2 $$

When we write down a number digit by digit, we're really following the path to it in a graph like this, dialing the number's … er … number. The rationals aren't shaped like a binary tree, rather, they look like a binary tree when viewed through the lens of binary division. Every infinite binary, ternary, quinary, etc. tree is then a different but complete perspective of the same underlying thing. We don't have the map, we have one of infinitely many maps.

$$ π = \class{green}{11}.\class{blue}{00}\class{green}{1}\class{blue}{00}\class{green}{1}\class{blue}{0000}\class{green}{1}…_2 $$

Which means we can show this graph is actually an interdimensional number portal.
See, we already know where the missing numbers are. Irrational numbers like $ π $ form a never-repeating sequence of digits. If we want to reach $ π $, we find it's at the end of an infinite path whose turns do not repeat. By allowing such paths, our map leads us straight to them. Even though it's made out of only one kind of rational number: division by two.

$$ π = \mathop{\class{no-outline}{►\hspace{-2pt}►}}_{\infty\hspace{2pt}} x_n \,? $$

So now we've invented real numbers. How do we visualize this invention? And where does continuity come in? What we need is a procedure that generates such a non-repeating path when taken to the limit. Then we can figure out where the behavior at infinity comes from.

Because the path never settles into a pattern, we can't pin it down with a single neat triangle like before. We try something else. At every step, we can see that the smallest number we can still reach is found by always going left. Similarly, the largest available number is found by always going right. Wherever we go from here, it will be somewhere in this range.

We can set up shrinking intervals by placing such triangles along the path, forming a nested sequence.

$$ \begin{align} 3 \leq & π \leq 4 \\ 3.1 \leq & π \leq 3.2 \\ 3.14 \leq & π \leq 3.15 \\ 3.141 \leq & π \leq 3.142 \\ 3.1415 \leq & π \leq 3.1416 \\ 3.14159 \leq & π \leq 3.14160 \\ \end{align} $$

$$ \begin{align} 11_2 \leq & π \leq 100_2 \\ 11.0_2 \leq & π \leq 11.1_2 \\ 11.00_2 \leq & π \leq 11.01_2 \\ 11.001_2 \leq & π \leq 11.010_2 \\ 11.0010_2 \leq & π \leq 11.0011_2 \\ 11.00100_2 \leq & π \leq 11.00101_2 \\ \end{align} $$

What we've actually done is rounded up and down at every step, to find an upper and lower bound with a certain amount of digits. This works in any number base.

Let's examine these intervals by themselves. We can see that due to the binary nature, each interval covers either the left or right side of its ancestor. Because our graph goes on forever, there are infinitely many nested intervals. This tower of $ π $ never ends and never repeats itself, we just squeezed it into a finite space so we could see it better.

If we instead approach a rational number like $ \frac{10}{3} = 3.333…\, $ then the tower starts repeating itself at some point. Note that the intervals don't slide smoothly. Each can only be in one of two places relative to its ancestor.

In order to reach a different rational number, like $ 3.999… = 4 $, we have to establish a different repeating pattern. So we have to rearrange infinitely many levels of the tower all at once, from one configuration to another. This reinforces the notion that rational numbers are not continuous.

If the tower converges to a number, then the top must be infinitely thin, i.e. $ 0 $ units wide. That would suggest it's meaningless to say what the interval at infinity looks like, because it stops existing. Let's try it anyway.

There is only one question to answer: does the interval cover the left side, or the right?

Oddly enough, in this specific case of $ 3.999…\, $ there is an answer. The tower leans to the right. Therefor, the state of the interval is the same all the way up. If we take the limit, it converges and the final interval goes right.

But we can immediately see that we can build a second tower that leans left, which converges on the same number. We could distinguish between the two by writing it as $ 4.000…\, $ In this case the final interval goes left.

If we approach $ 10/3 $, we take a path of alternating left and right steps. The state of the interval at infinity becomes like our paradoxical lamp from before: it has to be both left and right, and therefor it is neither, it's simply undefined.

The same applies to irrational numbers like $ π $. Because the sequence of turns never repeats itself, the interval flips arbitrarily between left and right forever, therefor it is in an undefined state at the end.

But there's another way to look at this.
If the interval converges to the number $ π $, then the two sequences of respectively lower and upper bounds also converge to $ π $ individually.

Remember how we derived our bounds: we rounded down by always taking lefts and rounded up by always taking rights. The shape of the tower depends on the specific path you're taking, not just the number you reach at the end.

That means we're approaching the lower bounds so they all end in $ 0000… \, $ Their towers always lean left.

If we then take the limit of their final intervals as we approach $ π $, that goes left too. Note that this is a double limit: first we find the limit of the intervals of each tower individually, then we take the limit over all the towers as we approach $ π $.

For the same reason, we can think of all the upper bounds as ending in $ 1111 …\, $ Their towers always lean right. When we take the limit of their final intervals and approach $ π $, we find it points right.

But, we could actually just reverse the rounding for the upper and lower bounds, and end up with the exact opposite situation. Therefor it doesn't mean that we've invented a red $ π $ to the left and green $ π $ to the right which are somehow different. $ π $ is $ π $. This only says something about our procedure of building towers. It matters because the towers is how we're trying to reach a real number in the first place.

See, our tower still represents a binary number of infinitely many bits. Every interval can still only be in one of two places. To run along the real number line, we'd have to rearrange infinitely many levels of the tower all at once to create motion. That still does not seem continuous.

We can resolve this if we picture the final interval of each tower as a bit at infinity. If we flip the bit at infinity, we swap between two equivalent ways of reaching a number, so this has no effect on the resulting number.

In doing so, we're actually imagining that every real number is a rational number whose non-repeating head has grown infinitely big. Its repeating tail has been pushed out all the way past infinity. That means we can flip the repeating part of our tower between different configurations without creating any changes in the number it leads to.

That helps a little bit with the intuition: if the tower keeps working all the way up there, it must be continuous at its actual tip, wherever that really is. A continuum is then what happens when the smallest possible step you can take isn't just as small as you want. It's so small that it no longer makes any noticeable difference. While that's not a very mathematical definition, I find it very helpful in trying to imagine how this might work.

$ 1, 2, 3, 4, 5, 6, … $

Finally, we might wonder how many of each type of number there are.
The natural numbers are countably infinite: there is a procedure of steps which, in the limit, counts all of them.

$$ 1, 2, 3, 4, 5, 6, … $$

$$ \class{orangered}{2, 4, 6, 8, 10, 12, …} $$

$$ \class{green}{0, 1, -1, 2, -2, 3, …} $$

We can find a similar sequence for the even natural numbers by multiplying each number by two. We can also alternate between a positive and negative sequence to count the integers. The three sequences are all countably infinite, which means they're all equally long.
There are as many even positives as positives. Which is exactly as many as all the integers combined. As counter-intuitive as it is, it is the only consistent answer.

$$ \begin{array}{cccccccc} 1 \hspace{2pt}&\hspace{2pt} 2 \hspace{2pt}&\hspace{2pt} 3 \hspace{2pt}&\hspace{2pt} 4 \hspace{2pt}&\hspace{2pt} 5 \hspace{2pt}&\hspace{2pt} 6 \hspace{2pt}&\hspace{2pt} … \\[6pt] \frac{1}{2} \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{2}{2}} \hspace{2pt}&\hspace{2pt} \frac{3}{2} \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{4}{2}} \hspace{2pt}&\hspace{2pt} \frac{5}{2} \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{6}{2}} \hspace{2pt}&\hspace{2pt} \\[3pt] \frac{1}{3} \hspace{2pt}&\hspace{2pt} \frac{2}{3} \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{3}{3}} \hspace{2pt}&\hspace{2pt} \frac{4}{3} \hspace{2pt}&\hspace{2pt} \frac{5}{3} \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{6}{3}} \hspace{2pt}&\hspace{2pt} \cdots \\[3pt] \frac{1}{4} \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{2}{4}} \hspace{2pt}&\hspace{2pt} \frac{3}{4} \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{4}{4}} \hspace{2pt}&\hspace{2pt} \frac{5}{4} \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{6}{4}} \hspace{2pt}&\hspace{2pt} \\[3pt] \frac{1}{5} \hspace{2pt}&\hspace{2pt} \frac{2}{5} \hspace{2pt}&\hspace{2pt} \frac{3}{5} \hspace{2pt}&\hspace{2pt} \frac{4}{5} \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{5}{5}} \hspace{2pt}&\hspace{2pt} \frac{6}{5} \hspace{2pt}&\hspace{2pt} \\[3pt] \frac{1}{6} \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{2}{6}} \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{3}{6}} \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{4}{6}} \hspace{2pt}&\hspace{2pt} \frac{5}{6} \hspace{2pt}&\hspace{2pt} \class{grey}{\frac{6}{6}} \hspace{2pt}&\hspace{2pt} \\[3pt] \hspace{2pt}&\hspace{2pt} \vdots \hspace{2pt}&\hspace{2pt} \hspace{2pt}&\hspace{2pt} \vdots \hspace{2pt}&\hspace{2pt} \hspace{2pt}&\hspace{2pt} \hspace{2pt}&\hspace{2pt} \hspace{2pt}&\hspace{2pt} \class{white}{\ddots} \end{array} $$

But we can take it one step further: we can find such a sequence for the rational numbers too, by laying out all the fractions on a grid. We can follow diagonals up and down and pass through every single one. If we eliminate duplicates like $ 1 = 2/2 = 3/3 $ and alternate positives and negatives, we can 'count them all'. So there are as many fractions as there are natural numbers. "Deal with it", says Infinity, donning its sunglasses.

$$ \begin{array}{c} 0.\hspace{1pt}\class{green}{1}\hspace{1pt}0\hspace{1pt}0\hspace{1pt}1\hspace{1pt}1\hspace{1pt}1\hspace{1pt}0\hspace{1pt}…_2 \\ 0.\hspace{1pt}1\hspace{1pt}\class{blue}{0}\hspace{1pt}0\hspace{1pt}1\hspace{1pt}0\hspace{1pt}0\hspace{1pt}1\hspace{1pt}…_2 \\ 0.\hspace{1pt}1\hspace{1pt}0\hspace{1pt}\class{green}{1}\hspace{1pt}0\hspace{1pt}0\hspace{1pt}1\hspace{1pt}0\hspace{1pt}…_2 \\ 0.\hspace{1pt}0\hspace{1pt}1\hspace{1pt}1\hspace{1pt}\class{green}{1}\hspace{1pt}0\hspace{1pt}1\hspace{1pt}1\hspace{1pt}…_2 \\ 0.\hspace{1pt}1\hspace{1pt}0\hspace{1pt}1\hspace{1pt}1\hspace{1pt}\class{blue}{0}\hspace{1pt}0\hspace{1pt}1\hspace{1pt}…_2 \\ 0.\hspace{1pt}0\hspace{1pt}1\hspace{1pt}0\hspace{1pt}1\hspace{1pt}0\hspace{1pt}\class{blue}{0}\hspace{1pt}0\hspace{1pt}…_2 \\ 0.\hspace{1pt}0\hspace{1pt}1\hspace{1pt}1\hspace{1pt}1\hspace{1pt}1\hspace{1pt}0\hspace{1pt}\class{green}{1}\hspace{1pt}…_2 \\ … \\ \\ 0.\hspace{1pt}\class{blue}{0}\hspace{1pt}\class{green}{1}\hspace{1pt}\class{blue}{0\hspace{1pt}0}\hspace{1pt}\class{green}{1\hspace{1pt}1}\hspace{1pt}\class{blue}{0}\hspace{1pt}…_2 \end{array} $$

The real numbers on the other hand are uncountably infinite: no process can list them all in the limit. The proof is short: suppose we did have a sequence of all the real numbers between $ 0 $ and $ 1 $. We could then build a new number by taking all the bits on the diagonal, and flipping zeroes and ones.
That means this number is different from every listed number in at least one digit, so it's not on the list. But it's also between $ 0 $ and $ 1 $, so it should be. Therefor, the list can't exist.

This even matches our intuitive explanation from earlier. There are so many real numbers, that we had to invent a bit at infinity to count them, and find something that would tick at least once for every real number. Even then we couldn't say whether it was $ 0 $ or $ 1 $ anywhere in particular, because it literally depends on how you approach it.

What we just did was a careful exercise in hiding the obvious, namely the digit-based number systems we are all familiar with. By viewing them not as digits, but as paths on a directed graph, we get a new perspective on just what it means to use them. We've also seen how this means we can construct the rationals and reals using the least possible ingredients required: division by two, and limits.

Drowning By Numbers

In school, we generally work with the decimal representation of numbers. As a result, the popular image of mathematics is that it's the science of digits, not the underlying structures they represent. This permanently skews our perception of what numbers really are, and is easy to demonstrate. You can google to find countless arguments of why $ 0.999… $ is or isn't equal to $ 1 $. Yet nobody's wondering why $ 0.000… = 0 $, though it's practically the same problem: $ 0.1, 0.01, 0.001, 0.0001, … $

Furthermore, in decimal notation, rational numbers and real numbers look incredibly alike: $ 3.3333… $ vs $ 3.1415…\, $ The question of what it actually means to have infinitely many non-repeating digits, and why this results in continuous numbers, is hidden away in those 3 dots at the end. By imagining $ π $ as $ 3.1415…0000… $ or $ 3.1415…1111… $ we can intuitively bridge the gap to the infinitely small. We see how the distance between two neighbouring real numbers must be so small, that it really is equivalent to $ 0 $.

That's not as crazy as it sounds. In the field of hyperreal numbers, every number actually has additional digits 'past infinity': that's its infinitesimal part. You can imagine this to be a multiple of $ \frac{1}{\infty} $, an infinitely small unit greater than $ 0 $, which I'll call $ ε $. The idea of equality is then replaced with adequality: being equal aside from an infinitely small difference.

You can explore this hyperreal number line below.

Note that $ ε^2 $ is also infinitesimal. In fact, it's even infinitely smaller than $ ε $, and we can keep doing this for $ ε^3, ε^4, …\,$ To make matters worse, if $ ε $ is infinitesimal, then $ \frac{1}{ε} $ must be infinitely big, and $ \frac{1}{ε^2} $ infinitely bigger than that. So hyperreal numbers don't just have inwardly nested infinitesimal levels, but outward levels of increasing infinity too. They have infinitely many dimensions of infinity both ways.

So it's perfectly possible to say that $ 0.999… $ does not equal $ 1 $, if you mean they differ by an infinitely small amount. The only problem is that in doing so, you get much, much more than you bargained for.

A Tug of War Between the Gods

That means we can finally answer the question we started out with: why did our continuous atoms seemingly all have $ 0 $ mass, when the total mass was not $ 0 $? The answer is that the mass per atom was infinitesimal. So was each atom's volume. The density, mass per volume, was the result of dividing one infinitesimal amount by another, to get a normal sized number again. To create a finite mass in a finite volume, we have to add up infinitely many of these atoms.

These are the underlying principles of calculus, and the final puzzle piece to cover. The funny thing about calculus is, it's conceptually easy, especially if you start with a good example. What is hard is actually working with the formulas, because they can get hairy very quickly. Luckily, your computer will do them for you:

We're going to go for a drive.

We'll graph speed versus time. We have kilometers per hour vertically, and hours horizontally. We've also got a speedometer—how fast—and an odometer—how far.

Suppose we drive for half an hour at 50 km/h.

$ \class{orangered}{25} $

We end up driving for 25 km. This is the area of spanned by the two lengths: $ 50 \cdot \frac{1}{2} $, a rectangle.

$ \class{orangered}{60} $

Now we hit the highway and maintain 120 km/h for the rest of the hour. We go an additional 60 km, the area of the second rectangle, $ 120 \cdot \frac{1}{2} $.
Whenever we multiply two units like speed and time, we can always visualize the result as an area.

$ \class{slate}{85} $

Because we crossed 85 km in one hour, this is equivalent to driving at a constant speed of 85 km/h for the duration. The total area is the same.

If this were a race between two different cars, we would see a photo finish. The distance travelled in kilometers is identical at the 1 hour mark. Where they differ is in their speed along the way, with the red car falling behind and then catching up.

The difference is visible in the slope of both paths. The faster the car, the more quickly it accumulates kilometers. If it drove 25 km in half an hour, then its speed was 50 km/h, $ \frac{25}{0.5} $. This is the distance travelled divided by the time it took, vertical divided by horizontal.

Slope is a relative thing. If we shrink the considered time, the distance shrinks along with it, and the resulting speed is the same. What we're really doing is formalizing the concept of a rate of change, of distance over time.

Constant speed means a constant increase in distance. We can directly relate the area being swept out left to right with the accumulated distance by each car. This is clue number 1.

Now suppose the red car starts ahead by 10 km and drives the same speeds.
It will also end up 10 km ahead after 1 hour, its path has simply been shifted by 10 units. The slope is unchanged: it doesn't matter where you are and where you've been, only how fast you're going right now. It's what's called an instantaneous quantity, it describes a situation only in the moment. This is clue number 2.

In order to get ahead, the red car had to drive there. So we can imagine it started earlier, $ \frac{1}{5} $ of an hour, driving for 10 km at the same speed. Again, the equality holds: area swept out equals accumulated distance, we add another $ 50 \cdot \frac{1}{5} $. Constant slope still equals constant speed.

One curve describes how the other changes in the moment, therefor the two quantities are linked somehow. We add up area to go from speed to distance; we find slope to go from distance to speed. We're going to examine this two-way relationship more.

Real cars don't start or stop on a dime, they accelerate and decelerate. So we're going to try more realistic behavior.

Suppose the speed follows a curve. In one hour, the car starts from 0 km/h, accelerates to over 100 km/h and then smoothly decelerates back to standstill. The distance travelled also curves smoothly, from 0 to 60 km, so we've driven 60 km in total.

We can immediately see that at the point where the car was going fastest, the distance was increasing the most. Its slope is steepest at that point. The relationship between the two curves holds.

But actually measuring it is a problem. First, there are no more straight sections to measure the slope on. If we take two points on a curve, the line that connects them doesn't touch the curve, it crosses it at an angle.

Second, we can no longer measure the area by dividing it into rectangles, or any other simple geometric shape. There will always be gaps. We can solve both of these problems with a dash of infinity.

We'll start with area. We have to find an upper and a lower bound again.
We're going to divide the curve into 4 sections.

First, the upper bound. We find the highest value in each section and make a rectangle of that height. This approach is too greedy and overestimates.

The lower bound is similar. We find the smallest value in each interval and make rectangles of that height.
This underestimates and leaves areas uncovered.

If we do 7 divisions instead. We can see that the upper bound has decreased: there is less excess area. The lower bound has increased: the gaps are smaller and more area is covered.

With 10 divisions, it's even better. It seems the upper and lower bounds are approaching each other.

And the same at 13 divisions. If we keep doing this, our slices will get thinner and thinner, and we'll be adding more of them together. If we take a limit, each slice becomes infinitely thin, and there are infinitely many of them. Let's step back and see what that means.

Take for example the sequence of lower bounds.

Because every slice is equally wide, we can glue them together into a single rectangle per step.
Its width $ w $ is the thickness of a single slice, and its height $ h $ is the sum of the heights of the slices.

In the limit, this rectangle becomes both infinitely thin and infinitely tall. This is a tug of war between Zero and Infinity where at first sight, they both seem to win. That's a problem. Luckily, we're not interested in the rectangle itself, but rather its area.

We can change a rectangle's sides without changing its area. We multiply its width by one factor (e.g. $ 2 $), and divide the height by the same amount. The area $ 2w \cdot \frac{h}{2} $ is unchanged. Hence, we can normalize our rectangles to all have the same width, for example $ 1 $.

We can do the same for the upper bounds. We can see that both areas are converging on the same value. This is the true area under the curve, which is neither zero nor infinite. In this tug of war, both parties are equally matched.

Now our sequence looks very different: it's approaching a definite area, sandwiched between red and blue.

$ \class{slate}{60} $

If we take the limit, we get the area under our curve.

$ \class{orangered}{60} $

This way we can find the area under any smooth curve. This process is called integration. The symbol for integration is $ \int_a^b $ where $ a $ and $ b $ are the start and end points you're integrating between. The S-shape stands for our sum, adding up infinitely many pieces.

$$ \int_0^T \! f(t) \mathrm{d} t $$

We can then integrate one curve to make another, by sweeping out area horizontally from a fixed starting point. We move the end point to a time $ T $ and plot the accumulated value along the way. Using limits, we can do this continuously. This takes us from speed to distance travelled. The quantity $ \,\mathrm{d}t\, $ is the infinitesimal width of each slice, an infinitely small amount of time.

Now we just need to figure out the reverse and find slopes. We'll go back to our failed attempt from earlier.

If we shrink the distance we're considering, our slope estimate gets closer to the true value. But if we try to take a limit, we end up dividing $ 0 $ by $ 0 $.

Instead we need to normalize our sequence again so it doesn't vanish.

We only care about slope: the ratio of the two right sides. Which means, if we scale up each triangle, the ratio is unchanged. That just comes down to multiplying both sides by the same number. Again we can scale them all to the exact same width.

Now we've created a limit that does converge to something rather than nothing.

This finite value is the slope at the point we were homing in on. Because we can apply this process at any point on the curve, we can find the exact slope anywhere. This is called finding the derivative or differentiation.

$$ \frac{ \mathrm{d} f(t) }{\mathrm{d} t} $$

We can also apply this process over an entire curve to generate a new one. So now we know how to go the other way: distance to speed. Mathematically, we are dividing an infinitesimal piece of the distance, the curve $ \,\mathrm{d} \class{slate}{f(t)}\, $, by an infinitesimal slice of time $ \,\mathrm{d} t\, $. Working with infinitesimal formulas is tricky however. There's always an implied limit being taken in order to reach them in the first place.

We can note that if we shift the distance curve up or down, the speed is unchanged. When you take a derivative, any constant value you've added to your function simply disappears. This shows again that speed is always in the moment, it only describes what's going on in an infinitely short piece of curve.

Differentiation is then like x-ray specs for curves and quantities, and it's turtles all the way down. For example, if we differentiate speed, we get acceleration. This is another rate of change, of speed over time. We see the car's acceleration is initially positive, speeding up, and then goes negative, to slow down, i.e. accelerate in the opposite direction.
Note: The acceleration has been divided by 4 to fit.

If we integrate acceleration to get speed, we have to count the second part as negative area: it is causing the speed to decrease.

We can see that the point of maximum speed is the point where the acceleration passes through $ 0 $. One of the most useful applications of derivatives is indeed to find a maximum or minimum of a curve more easily. No matter where it is, the slope at such a point must always be horizontal.

Let's end this with a more exciting example. What's tall, fast and makes kids scream?

A roller coaster! We'll construct a little track by welding together pieces of circles and lines.

Alas, we shouldn't be too proud of our creation. Even though it looks smooth, there's something very wrong. This is how you build roller coasters when you don't want people to have fun. To see the problem, we need to use our x-ray specs.

$$ \class{orangered}{f^{\prime}(x)} = \frac{\mathrm{d}\class{slate}{f(x)}}{\mathrm{d}x} $$

We differentiate the height into its slope. It has sharp corners all over the place. Even though the track itself looks smooth, it doesn't change smoothly. The slope is constant in the straight sections and changes rapidly in the curved sections.

$$ \class{green}{f^{\prime\prime}(x)} = \frac{\mathrm{d^2}\class{slate}{f(x)}}{\mathrm{d}x^2} $$

If we take the derivative of the slope, i.e. find the slope's slope, we get a measure of curvature. It's positive inside valleys, negative on top of crests. This graph is even worse: there are sharp peaks and cliffs. Note that in the formula, we are now dividing by the square of the infinitesimal distance $ \mathrm{d}x $. This is like going two levels down on the hyperreal number line and back up again.

$$ \class{teal}{κ(x)} = \frac{1}{ρ} = \frac{ \class{green}{f^{\prime\prime}(x)} } { (1 + \class{orangered}{f^{\prime}(x)}^2)^{3/2} } $$

We can see better if we replace the second derivative with the 2D curvature.
This is the radius of the circle that touches the curve at a given point. As this radius gets infinitely big on straight sections, we use its inverse, $ \class{teal}{κ} $. Because of how we built the track, $ κ $ switches between $ 0 $ and a constant positive or negative value.
At every switch, there will be a corresponding change in force, a jerk.

Let's simulate a ride. As riders go through our curved sections, their inertia will push them to the outside of the curve. From their point of view, this is a centripetal force up or down. We'll plot the (subjective) vertical G force including gravity. It starts at a comfy 1 G, but then swings wildly between 0.5 G and 1.25 G.

Even though the track seems smooth, we can see that the vertical G's are not. Every time we enter a curve, we experience a sudden jerk up or down. This is due to the jumps in the curvature. The G's are themselves curved, because the rider's sense of gravity decreases as the cart goes vertical. The sharp dips below 0.5 G are not simulation errors: this is actually what it would feel like.

To really highlight the problem, we need to x-ray the G's and derive again. G forces are a form of acceleration. The derivative of acceleration is a change in force, called jerk. Whenever it's non-zero, you feel jerked in a particular direction.

To fix this, we need to alter the curve of the track and smooth it out at all the different levels of differentiation. Here I've applied a relaxation procedure. It's like a blur filter in photoshop: we replace every point on the track with the average of its neighbours. We get a subtly different curve. Its height hasn't changed much at all, it's just a little bit less tense.

But this minor change has a huge effect on both slope and radius of curvature. They are completely smoothed out, with all corners and jumps removed.

If we do another simulation, the G force graph looks completely different. There are no more jumps.

But the real difference is in jerk. There are no more actual jerks, only smooth oscillations. Instead of bruises, riders will get butterflies. Thanks to calculus, we avoided that painful lesson without ever having to ride it ourselves.

Please check your pockets for loose items. Lost property will not be returned.

Let's start with the original, unrelaxed track. Thanks to calculus, we can simulate head-bobbing so you can get a feel for how jerky this is. Even virtually, this isn't very pleasant.

This is the improved track. Notice the smooth transitions in and out of curves.

And that's how you make sweet roller coasters: by building them out of infinitely small, smooth pieces, so you don't get jerked around too much.

That was differential and integral calculus in a nutshell. We saw how many people actually spend hours every day sitting in front of an integrator: the odometers in their cars, which integrate speed into distance. And the derivative of speed is acceleration—i.e. how hard you're pushing on the gas pedal or brake, combined with forces like drag and friction.

By using these tools in equations, we can describe laws that relate quantities to their rates of change. Drag, also known as air resistance, is a force which gets stronger the faster you go. This is a relationship between the first and second derivatives of position.

In fact, the relaxation procedure we applied to our track is equivalent to another physical phenomenon. If the curve of the coaster represented the temperature along a thin metal rod, then the heat would start to equalize itself in exactly that fashion. Temperature wants to be smooth, eventually averaging out completely into a flat curve.

Whether it's heat distribution, fluid dynamics, wave propagation or a head bobbing in a roller coaster, all of these problems can be naturally expressed as so called differential equations. Solving them is a skill learned over many years, and some solutions come in the form of infinite series. Again, infinity shows up, ever the uninvited guest at the dinner table.

Closing Thoughts

Infinity is a many splendored thing but it does not lift us up where we belong. It boggles our mind with its implications, yet is absolutely essential in math, engineering and science. It grants us the ability to see the impossible and build new ideas within it. That way, we can solve intractable problems and understand the world better.

What a shame then that in pop culture, it only lives as a caricature. Conversations about infinity occupy a certain sphere of it—Pink Floyd has been playing on repeat, and there's usually someone peddling crystals and incense nearby.
"Man, have you ever, like, tried to imagine infinity…?" they mumble, staring off into the distance.

"Funny story, actually. We just came from there…"

Comments, feedback and corrections are welcome on Google Plus. Diagrams powered by MathBox.

More like this: How to Fold a Julia Fractal.

↧

On WebGL

March 11, 2013, 12:00 am

≫ Next: Storms and Teacups

≪ Previous: To Infinity… And Beyond!

On WebGL

More than pretty pictures

Like a dragon, WebGL slumbers. But you've seen them, right? Those seemingly magical demos that transform your ordinary browser into a lush 3D world with one click?

While available in Chrome and Firefox on the desktop, WebGL is still not widely supported. So far it's mostly used for demo projects and flashy one-off brochures. On the few mobile devices that support it, you need developer access to enable it. It's certainly nowhere near to being ready for prime time. So why should you care?

AlteredQualia

The Black Sheep

The goal of WebGL is to bring the graphics capabilities of traditional apps and games into the browser, with performance as the main benefit. The graphics hardware does the work directly, leaving the CPU to just coordinate. Yet those developers look on with skepticism: "You mean we have to code in JavaScript?" There's grumbling about the limited capabilities too, which lag a few years behind the latest OpenGL and Direct3D APIs, and there's worries about copyright and modding.

First, we have to be honest: there's no question that native apps and 3D engines will continue to excel, bringing cutting edge graphics and performance. The layers of indirection in both HTML5 and WebGL cannot be circumvented.

But they do serve a purpose: to provide a safe sandbox for untrusted code from the web at large. Even triple-A gaming titles still occasionally crash, a result of their complexity, with thread synchronization, memory management and manual context-switching the price to pay. Random phishers shouldn't have that level of access to your system, nor should it be required.

AlteredQualia

WebGL represents a different way of using high-performance graphics: not as a bare metal API with caveats, but as a safe service to be exposed, to be flicked on or off without a second thought. It may not sound like much, but the security implications are big and will only be solved carefully, over time. It's undoubtedly a big reason behind Apple and Microsoft's reluctance to embrace it.

We should also note that this isn't a one-way cross-over. HTML has already snuck into the real-time graphics scene. First we saw in-game web views and browsers, then UIs such as Steam's overlay. In fact, all of Steam is WebKit. The main benefit is familiarity: designers can use the well-known techniques of the web both inside and outside the game. This mirrors the way Adobe Flash entered the gaming space before, being used to drive menus and overlays in many games.

It's been said that the skills required for front-end web development and game development eventually converge on the same thing. The technologies certainly have.

Felix Woitzel

The Procedural Canvas

The web is the world's only universal procedural medium. Content is downloaded in partially assembled form, and you and your browser decide how it should be displayed. The procedural aspect has always been there, and today's practice of responsive design is just another evolution in procedural page layout. It all started with resizable windows and tables.

But when we decide to put a graphic into a page, we still bake it into a grid of pixels and send that down the pipe. This has worked great as a delivery mechanism, but is starting to show its age, due to high DPI displays and adaptive streaming.

It's also pushed the web further towards consumption: YouTube and Tumblr are obvious results. Both sites have a huge asymmetry between content creator and consumer, encouraging sharing rather than creating.

Turing pattern gradient attractor feedback

Felix Woitzel

Real-time graphics level the playing field: once built, both creator and consumer have the same degree of control—at least in theory. All the work necessary to produce the end result is ideally being done 60 times per second. The experience of e.g. playing a game is like a sort of benign DRM, which requires you to access the content in a certain way. All native apps implement such 'DRM' by accident: their formats are binary and often proprietary, the code is compiled. Usually modding is supported in theory, but the tools simply aren't included.

The web is different. No matter how obfuscated, all code eventually has to talk to an interface that is both completely open and introspective. You can hook into any aspect of it and watch the data. There isn't a serious web developer around who would argue that this is a bad thing, who hasn't spent time deconstructing a site through a web inspector on a whim.

Florian Bösch

This is where WebGL gets interesting. It takes the tools normally reserved for well, the hardcore geeks, and makes them much more open and understandable. I can certainly say from experience that coding with an engine like Three.js is an order of magnitude more productive than e.g. Ogre3D in C++. For most of the things I want to do with it, the performance difference is negligible, but there is much less code. Once you get your dev environment going, creating a new 3D scene is as simple as opening a text file. You can interact with your code live through the console for free.

More so, it integrates with the publishing tools we already know. I wonder for example how many hours of dev time the game industry has spent reinventing the wheel for fonts, menus, option screens, etc. To be fair, they often do so with amazing production value. But guess what: you now have CSS 3D, and soon you'll have CSS shaders. You don't need custom in-house tools when your designers can just use Chrome's Inspector and get the exact same result. Content delivery is easy: you have cloud storage, CDNs and memory caches at your disposal.

There is a missing link however: WebGL is a canvas inside the DOM, isolated from what's outside. But you could imagine APIs to help bring DOM content into a WebGL texture, taking over responsibility for drawing it. After all, most web browsers already use hardware acceleration to compose 2D web pages on screen. The convergence has already started.

Florian Bösch

The web has a history of transformative changes. CSS gave us real web design, Flash gave us ubiquitous video, Firebug gave us Web Inspectors, jQuery gave us non-painful DOM manipulation, and so on. None of these ideas were new in computing when they debuted, the web merely adapted to fill a need. WebGL is an idea in a similar vein, a base platform for an ecosystem of specialized frameworks on top.

It can help lead to a WolframAlpha-ized LCARS future, where graphics can be interactive and introspective by default. Why shouldn't you be able to click on a news graphic to filter the view, or download the dataset? For sure, this is not something that uniquely requires WebGL, and tools like d3.js are already showing the way with CSS and SVG. As a result, the last mile of interactivity becomes a mere afterthought: everything is live anyway. What WebGL does is raise the limit significantly on what sort of content can be displayed in a browser. It's not until those caps are lifted that we can say with a straight face that web apps can rival native apps.

Still, we shouldn't be aiming to recreate Unreal Engine in HTML / JS / GL, though someone will probably try, and eventually succeed. Rather we should explore what happens when you put a 3D engine inside a web page. Is it web publishing, or demoscene? Does it matter?

Ro.me team

Mr Doob

Chrome Workshop

A Useful Baseline

In this light, WebGL's often lamented limitation becomes its strength. WebGL is not modelled after 'grown-up' OpenGL, but mirrors OpenGL ES (Embedded Systems). It's a suite of functionality supported by most mobile devices, but eclipsed by even the crummiest integrated laptop graphics from 3 years ago.

This needn't be a worry for two reasons. First, WebGL supports extensions, which add to the functionality and continue to be specced out. A WebGL developer can inspect the capabilities of the system and determine an appropriate strategy to use. Many extensions are widely supported, and even without extensions, all GL code is already subject to the platform's size limits on resources. WebGL is no different from other APIs, it just puts the bar a bit lower than usual.

Second of all, it means WebGL is the only 3D API that has a shot at being universal, from desks to laps to pockets to living rooms, and everything in between. Your game console could be an Android computer, handheld or appliance. Your TV might run Linux or iOS. So might your fridge. WebGL fits with where hardware and software is going, and adapting to various devices is nothing new for the web. I imagine we might see a standardized benchmark library pop up, and developer tools to make e.g. desktop Chrome mimic a cellphone's limited capabilities.

For the Christmas demo above, I included a simple benchmark that pre-selects the demo resolution based on the time needed to generate assets up front. Additionally, it was built on a 4 year old laptop GPU, so it should run well for the majority of viewers on first viewing. The same can't be said for cutting-edge demoscene demos, which often only run smoothly on top of the line hardware. I know I'm usually resigned to watching them on YouTube instead. As neat as tomorrow's tech is, for most people it only matters what they have today.

This is the biggest philosophical difference between WebGL and OpenGL. WebGL aims to be a good enough baseline that you can carry in your pocket as well as put on a big screen, and make accessible with a simple link. I don't expect graphics legends like John Carmack to take anything but a cursory glance at it, but then, it's not encroaching on his territory. It is a bit surprising though that the demoscene hasn't taken to the web more quickly. It has never been about having top of the line hardware, only what you use it for. Contests like JS1K continue to demonstrate JavaScript's expressiveness, but we haven't really seen the bigger guns come out yet.

And it really is good enough. Here's 150,000 cubes, made out of 1.8 million triangles:

AlteredQualia

Next up is a fractal raytracer. At 30 frames per second, 512x512 pixels, 40 iterations per pixel, each folding 3D space 18 levels deep… that's 5.6 billion folds per second. That's just the core loop and excludes set up and lighting. It's all driven by a couple kilobytes of JavaScript wrapped in some HTML, delivered over HTTP.

$Distance estimation with fractals$

Why wouldn't you want to play with that? Come try WebGL, the water's fine.

Storms and Teacups

March 24, 2013, 12:00 am

≫ Next: Why Android Hates You

≪ Previous: On WebGL

Storms and Teacups

If you've been paying attention, you'll have seen a lot more discussions about gender, feminism and harrassment lately. The conversation mostly revolves around the latest incident of the day. I'd like to reflect on the bigger picture instead, and talk about some uncomfortable truths.

This is about how we act, online and offline, and why we do it.
Please read it top to bottom, or not at all.

Special thanks go to the folks who took time to provide feedback on drafts.

The examples used in this article, whether positive or negative, are chosen for their representative nature. They are not unique exceptions that deserve special sympathy, scrutiny or scorn.

Juliana Coutinho

The Shametweet

Atlassian, provider of software development infrastructure, sends out a tweet to advertise one of their services:

If you're ready for a build server so pretty you could take it to the prom,
you're ready for @Atlassian Bamboo.

The response is immediate and harsh:

Sexist ads won't win you fans!
Grow up and don't use gendered terms to promote your tech products

A reply is made:

Sorry you don't like the wording!
We weren't being gender specific though. Men are pretty too!

Finally, cue the condescending follow ups:

For fuck's sake, way to exhibit absolutely no understanding whatsoever of the subtleties of patriarchy. Get educated.

Look closely and you'll see this pattern pop up more and more, in various forms. The key word is always educate, or more accurately, re-educate. The tone varies from feigned concern to outright hostility. If only you weren't so ignorant, you wouldn't have made such horribly offensive statements. Apologies are dismissed as insincere, a refusal to admit one's true sins.

But let's step back for a bit and look at what was said. First, Atlassian's reply is right, they weren't being gender specific, they merely compare a piece of software to prom. That's not what the indignant reader saw. They read between the lines, and substitute it with something like this:

Women are expected by society to always be pretty. We think this is great.
Prom is a celebration of this institutional sexism. Let's trivialize it by comparing it to server technology.
We think you'll enjoy our use of sexism and buy our products.

For sure, everyone has their own interpretation and (I hope) I'm exaggerating. But the tweet's supposed sexism is not actually there. The speaker's intent is completely ignored, the hurt feelings of the offended take priority. The reinterpretation itself is sexist: only women can be pretty.

The worst form of this behavior is what I call the Shametweet. This is when someone retweets a statement—usually a perceived insult directed at themselves—without any further comment. The tweeter seemingly considers it beneath themselves to address the insolence directly. Instead, they choose to demonstrate their superior sensibilities to their followers. Those will then jump to his or her's defense, making the problem go away with a single click of a button, while they maintain an aura of innocent plausible deniability.

To my lack of surprise, it's mostly women who I see doing this, voluntarily turning themselves into objects, letting others claim their agency, and usually men who are all too eager to jump to the rescue, even when it's not requested. Some celebrities do it too, sicking a million followers on a target who failed to stroke their ego that morning. More than a few of these fragile celebs are men.

Objectification

Anita Sarkeesian dislikes sexist tropes and objectification of women in video games and wants to bring this problem to light. As one might expect with anyone who does anything on the internet, trolls show up, and insults and accusations of sexism start flying around. Things get ugly, and valid criticism is lost in a sea of crud. Anita cleverly uses the Streisand effect to her advantage, gets publicity in both feminist and general media, ending in a successful $158K Kickstarter campaign to produce a web video series.

Jezebel, billing itself as "Celebrity, Sex, Fashion for Women", is one of the sites eagerly siding with Anita. It appeals to their readership: a young audience of mostly women who enjoy seeing another woman doing her own thing, more so when it irritates men and advances the status of the sisterhood—if the comments are anything to go by.

Fast forward. Jezebel asks "Why Is Michelle Williams in Redface?", "You should know better".

Her transgression was to appear on a fashion magazine cover "dressed in a braided wig, dull beads, and turkey feathers [...] in a flannel shirt, jeans, and [...] some sort of academic or legal robe. [...] An attempt to portray reservation nobility [...] like she's the member of another race."

But they don't stop there. This tasteless display is in fact "akin to putting a picture of a Gentile in a stereotypical Jewish getup on the cover of Adolf Hitler's Mein Kampf". Godwin triumphs once again.

The writer may indeed have a point in there somewhere, that is, about stereotypes of First Nations cultures. But the irony is so thick you can spread it like Nutella.

Jezebel eagerly celebrates the advances of women over male-dominated society at every turn, decries Patriarchy and rings the alarm bell whenever supposed standards of equality and self-determination are violated. Now they complain that an industry they focus on, which treats people like objects to be dressed and painted, didn't objectify a woman in a tasteful enough fashion.

They should do an exposé on the Emperor's wardrobe next.

Who is it really, that is pressuring women to be passive, immaculate and above all, politically correct dolls? Is it really all men's fault? Or is it fueled by media and advertising that bills itself "For Women" in giant pink letters, but really seems to be just about "Judging Women" instead, telling them they need to look better, be likeable supermoms as well as executives, but deserve to have it all, honest?

On the other side, gaming sister-site Kotaku asks "She's Sexy. Now kill her?", questioning the "humiliation of sexualized females" in God of War: Ascension. In this game's bloody quest of revenge, after a couple hours of brutally murdering several armies of mythological creatures one by one, you stab the Medusa-like Gorgon in the chest. On top of its giant snake body, right where its breasts are. Gasp.

This scene summarizes "all [the] issues with violence against sexualized female characters in one nutshell." But after describing it in the context of the game, only one real objection remains: "Breasts code some enemies as female, [...] violence against [these] body parts is disturbing," and is not the usual "norm in games".

The game is presenting "a form of feminine beauty that associates exposed, large breasts as beautiful." The author seems to be confusing "sexualized" and "sexy", as if sexualization is only what turns him on—I think it's breasts—and something must be sexualized before it can be arousing. Apparently if the Gorgon had been obese and flat-chested, there'd be no issue in putting it down. Which is exactly what Euryale looked like, the repulsive Gorgon the author must've killed in the previous game.

This attempted pro-woman analysis of sexualized portrayal seems to suggest that a feminized body is automatically sexual, but only if she's hot enough, like say, the "final, sexy boss."

The Social Justice Warriors

Skeptic blogger and retired medical doctor Harriet Hall writes a post, titled I Am Not Your Enemy: An Open Letter to My Feminist Critics. She clarifies exactly what she said and meant on a previous occasion. The comments then continue to argue back and forth about what it all means.

It goes back to a t-shirt she wore at a conference, stating she "felt safe and welcome" and was "just a skeptic, not a 'skepchick', not a 'woman skeptic', just a skeptic". This shirt was apparently so offensive and dehumanizing it reduced one of its victims to tears.

All of this is fallout from the scandal known as ElevatorGate. A man at a conference asked Rebecca Watson up for coffee in an elevator, after a late night in the hotel bar, and accepted no for an answer. Cue the public shaming based on her one-sided account, using her position as a conference speaker, and the inevitable backlash. The man himself however has wisely chosen to stay out of it and remains unidentified. It prompted Richard Dawkins to point to more serious women's issues to possibly worry about, who was then chastized for speaking from white male privilege. This scandal, entirely based on hearsay, is still going on a year later.

In fact, Harriet's thread features an appearance from Rebecca herself. She takes "ten precious minutes" out of her busy schedule to explain she "doesn't really think of [her] at all", after clarifying why she feels the post talks about her directly. Despite admitting to writing and deleting both a blog post and a private email on the subject, Rebecca says Harriet "doesn’t actually deserve an explanation, [or] real estate in my head" which is why she "let others argue over it". Which she says right after arguing over it.

Does this sound at all familiar? She includes that she would be "concerned for [her] personal hygiene" for wearing one shirt several days in a row. I'm not making this up.

Like Dawkins, I wonder: Don't these people have more important things to get angry at? Are they just self-absorbed, seeking publicity through controversy? Some undoubtedly are, but for the majority I think it's far more simple.

It's fair to ask: why are they so bothered and offended, spending their free leisure time organizing miniature online protests, thread after thread? Was the t-shirt (or the tweet) a direct, personal insult? Did it insult a class of people they belong to? Is it specific enough that someone could reasonably argue it applies to them, but not the next person? No.

So why take it personally? It's because it reminds us of an uncomfortable truth about ourselves or the world. In Atlassian's case, it's that beauty has a dark side, and it gives some people an unfair advantage or disadvantage. Did I get this job because of my talents or my looks? Do I present myself badly? Do people judge me by things beyond my control? Do I have a weird face? It reminds us of all the times we've experienced this ourselves, and if you have children, of all the times they will too. The internet becomes a mirror for our own insecurities, and we read our worries into everything.

In Harriet Hall's case, it's the acknowledgement that life is what we make of it, that people disagree with us more than we like to admit, and that often the best thing to do is shrug and not let it bother you, and focus on results rather than labels. Though again, everyone's interpretation is different.

But we don't want to admit that, our pride does not allow it. We'd much rather explain our unease by assuming it was inflicted deliberately, and we make up convenient reasons why that is so, why we were targeted. See, Atlassian is just another sexist tech company, they can't even tweet without insulting every woman on the planet! Harriet Hall, born in 1945, the second ever female intern in the US Air Force, must be an ignorant ditz when it comes to matters of feminism, because of one smelly t-shirt. If you don't see it the same way, well, you're just not educated enough to read between the lines.

It's both men and women who do it. We can argue who is more at fault until the cows come home, but when it comes to sexism it's fair to say men take the brunt of the blame, and are the ones expected to make amends. It's completely one sided, and it's another one of those convenient excuses that we substitute for the real thing. We don't want to talk about the full complexity at play here. Indeed, the closest feminism gets to acknowledging this is, Patriarchy hurts men too! So it's not my fault, just the result of every single choice I've ever made?

When someone points out that viewing everything through a uniquely feminist and female-oriented lens gives a skewed perspective, a rapid fire meme is returned: "But what about the mennzzz?" Attempting to show that inequality applies to both genders, quite often in women's favor, is considered derailing. Showing that the feminist interpretation of history as unbridled Patriarchy is unrealistic, and that feminism has long ago developed its own oppressive and hateful character, is dismissed as misogyny, even when it's women saying it.

There's more handy tropes to end attempts at nuance and shut down discussion: Check Your Privilege, Stop JAQing off (Just Asking Questions), Mansplaining, Victim Blaming, Nice Guy, Schrödinger's Rapist. The list goes on, and all of a sudden, concerns about gendered slurs no longer apply.

The so-called "safe space" that these online social justice groups claim to seek, is just another word for a censored space, and a hypocritical one at that. It's one where certain ideas and thoughts are not to be uttered, and must be replaced by less realistic and less worrisome ones. But no true safe space exists, as offense is always in the eye of the beholder.

Listening involves an interpretation of what people thought it meant they heard.

Women in Open Source

Statistics show that women observe sexism online to a higher degree than men, particularly in tech and open source. Recommendations are made on how to make the community more friendly to women, and most suggestions involve re-educating men to reduce their blindness. More so, it's implied that once the atmosphere is respectful enough, women will join and equality will be achieved.

Sorry, but I don't buy it, because as late as 2006, 28% of participants in proprietary software were women, but only 1.5% in open source. Most open source projects start out as hobbies, created by one person in their spare time. If the community was such a sexist hell for women, wouldn't you expect the web to be littered with the abandoned works of that 1/4th of professionals who are women, who were turned off by how it was received once published? Instead, I find that female-founded projects are far and few, and calls for women to participate consist mainly of inviting them into existing projects, and speaking at established conferences about existing technologies.

Is the increasing role of women in open source a consequence of empowerment and self-direction? Or does it stem from the fact that open source is becoming more important in commercial use, and now more women are tagging along? It's both, naturally, but the huge gap between the two gender ratios can't be reduced to abuse and sexism. For a multitude of reasons, women simply aren't as interested as a group.

A big part of the problem is confidence, and starts much earlier: you must be this smart to be in open source, or so people think. Angela Byron, winner of the 2008 Google-O'Reilly Best Contributor award, called to "Fight the Einstein Perception" in Women in Open Source. It took Google's Summer of Code to convince her to take the plunge and make the career change. Programs like that are great to bring fresh talent into a community, but they won't cause the seismic shift in gender balance that feminism requires. If we want more women in open source, shouldn't we encourage them to just do their own thing, as those 98.5% of contributors who were male seemed to be doing?

Open source is claimed to be a meritocracy, but it really isn't. Once two people start modifying the same code, politics get involved, and I can certainly speak from experience that decisions at the top of an open source project are more about people and their interests than code. It isn't enough to create a good solution, it must be advocated and accepted, and apply to a wide variety of existing scenarios. If the work isn't good enough and fails, reputations take a hit. Like this:

Linus Torvalds can act like a complete asshole, self-admittedly so, chew out his (male) contributors, and nobody in particular seems to mind. Linux is successful either despite or because of it.

Linus builds and directs software millions rely on. His abrasive tone reflects the importance of the issues he deals with on a daily basis. So far, his peers have deemed it socially acceptable. You may hate this, but you can't ignore it.

Can we really say with a straight face that he could talk the exact same way to a female contributor, and nothing would be different? In a culture where "never hit a woman" is considered a valid rule by many, men are the default assumed aggressor in domestic violence, and expected to chase the burglar—another man no doubt—out of the house to protect their wife and children? Or would it spawn thread after thread of discussions of just how bad the transgression was, and how to make sure it never happens again?

Open source culture is quite competitive, but the biggest problem an open source contributor has isn't criticism, it's getting people to pay attention in the first place. Ironically, this is something women are innately privileged in: studies show women have automatic in-group bias—women like women more than men like men—that people prefer their mothers to their fathers, and men are universally associated with negative behavior such as violence. It's propagated in the popular stereotypes of the bumbling husband, the insensitive jock, the aggressive bully, and so on.

That perspective is dismissed by feminists as lashing out from male privilege, and the fear of losing it. But how privileged are men over women, when their life expectancy recedes further from that of women the lower the standard of living? Is there a Kickstarter I can donate to for that? No, instead National Geographic states matter of factly that it's a "troubling trend" and a "wake up call" that men's life expectancy is getting closer to that of women in the US, because it means women are gaining less. They use the margin by which women outlive men as if it's some sort of index of prosperity.

Hey, remember that time when Hillary Clinton said "Women have always been the primary victims of war"? Because they "lose their husbands, their fathers, their sons in combat." A woman who survives is more of a victim than a man who dies for her, please be sure to educate yourself on this.

Could it be that the sexism women say they are constantly subjected to online, is merely the flipside of a coin? One that allows them to cultivate attention with nothing more than a well-chosen avatar, and which men eagerly give to them? How many women forego the make-up in their profiles and videos before lamenting the unsolicited date proposals, awkward as they may be?

I'm not ignoring cases like Kathy Sierra and the persistent, real harassment she received, but let's not forget that it was inflicted by individuals upon individuals, not on womankind.

When the overwhelming majority of open source contributors are men fighting for recognition, do you suppose some of them might feel some resentment that a woman can walk into a room, real or virtual, and make everyone's head turn? If so, do women's concerns deserve automatic precedence over men's? The country I live in has a Minister for the Status of Women after all. Not for Equality.

The Anti-Harrassment Policy

To attend or speak at JSConf, you must agree to a code of conduct. Its goal is to create a positive, harassment free environment, something which I am all for. The policy is starting to be adopted verbatim by other conferences, like PyCon.

But the wording explicitly defines harrassment as including "offensive verbal comments", specifically "related to gender, sexual orientation, disability, physical appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention."

How many of the storms in teacups above would fall under this wide umbrella? If the yardstick to be applied is offense, then this basically forces everyone to walk on egg shells and admit guilt ahead of time. "Participants asked to stop any harassing behavior are expected to comply immediately." There is no room here to discuss the merit of a particular case, to measure the validity of a claim.

Keeping it on-topic: the problem with discussing sex at technical conferences

Indeed, the latest is that we cancel the talk first, ask questions later, based on the concerns of a single complaint over a title without a summary. The threat of going public was possibly made, but accounts differ. I find the Ada Initiative's first response to the situation revealing.

While stressing the real issue is staying on topic and not devolving into unnecessary sexual talk, every negative point raised appears to concern only women. "Sexual topics [...] can be perceived as encouragement to humiliate, objectify, and assault women, regardless of the intent of the speaker." And, "Many people are unable to separate 'talking about sex' and 'saying derogatory things about women'." Their response shows they assumed the talk would not be "done in a woman-positive way". That is, a talk featuring a female speaker who blogs about harm reduction.

At no point do they express regret at having silenced a voice. "Be considerate and thoughtful," it ends.

Let me borrow a quote from Stephen Fry: "The only people who are obsessed with food are anorexics and the morbidly obese, and that in erotic terms is the Catholic church in a nutshell." You'll never see more talk of sexism and rape than on feminist websites.

Trigger warnings, humiliation, objectification, assault, rape culture: feminism's opinion of neither men nor women's abilities to act mature around each other seems particularly high.

As an aside, have you ever noticed how Tumblr isn't just a hub for bold feminism, but also erotic fanfics? And by 'erotic' I mean gay sex of dubious consent set in the Twilight universe. You know. Rape. That fangirls write and fantasize about. And joke about in hushed tones at Comic-Con. Is that woman-positive enough, or are the lines blurring a bit?

More recently, someone lost their job after public shaming involving an overheard and misinterpreted comment about "forking" and "dongles", and the guy still felt the need to apologize profusely to the female offendee. Her media presence exceeds his by far and includes tweeting about "[putting] something into your pants [...] like a bunch of socks". Meanwhile followers thanked her for her bravery, that is, snapping a picture with a smile and throwing it to the lions. Who was abusing who here?

Of course it blew up into its own internet storm, but can you blame people for responding in kind to an example that's been so clearly set?

People read Woman fired for getting upset at man's joke and fill in the rest of the story themselves, like this animated GIF equivalent of a temper tantrum. More dignified publications instead carefully explain "Why asking what [she] could have done differently is the wrong question", that is, the one question in this entire fiasco the rest of us could actually learn something from.

Judging a book by its cover is the new tolerance. We throw people into the stocks based on feelings while ignoring intent and assuming victimhood. This is why I fundamentally disagree with equating offense with harassment: it provides unlimited ammo and shuts down discussion rather than giving people the benefit of doubt. It elevates the exception to the norm, by presuming the worst.

Here's a clause I'd like to see instead: if you choose to air minor incidents in public one-sidely—or threaten to do so—rather than resolving the matter in private, you lose by default. Leave the soapbox for the people who actually need it. Also, if a speaker has been invited and has spent time preparing a talk, it's the most basic courtesy to honor that invitation, no matter what. Let people judge it on its own merits. We attend conferences to hear other points of view, not to be sheltered from them.

As for the creeper move cards, please toss them out, because that's not how adults resolve differences. How gender-neutral is the word creep anyway, and how would you respond to being dismissed with a generic scrap of paper printed from the internet?

If you reduce communication to such a passive aggressive and childish statement, color me unsurprised when you receive an equally childish response, especially in a community that thrives on subversion and creative re-use of things they're not supposed to toy with. It's the exact same attitude that protects us from DRM, eagerly tests claims of privacy and security, and liberates closed technology for those without access. You cannot have one without the other.

Conferences are social gatherings, and sexuality is a normal part of that. I know several happy couples who met at a tech conference, coming from different cities or even countries. Are we to assume that none of them used this opportunity to hook up, and that relationships never happen without ambiguity and misunderstanding? It's not a binary choice between tweeting #ITappedThat and turning conferences into convents.

But why does it seem like there are so many socially maladjusted men roaming these conferences? Does anyone care about the reasons at all, like say, the high rate of autism-spectrum disorders among geeks? Could it be due to the emphasis schools and universities place on non-intellectual pursuits like sports and popularity, and the bullying that results from it? Because it seems to me what some socially awkward hackers have done is exactly what the social justice warriors want: they've created a safe space for themselves, where only their own rules apply.

I never hear much about the effect "Jock culture" has on men, but quite a lot about "Rape culture" and women. We stereotype geeky men as neckbearded basement dwellers whom women are to be protected from unilaterally, rather than working towards real resolution. I don't mind the word neckbeard personally, it can be a humorous badge of pride, but if it's offensive to anyone, surely that's men, not women?

Beating Which Odds?

In a post titled, Beating the Odds, the JSConf organizers explain how they got 25% of their speakers to be women. The choice quote is: "Our industry systematically biases against 50% of great speakers and misses out on a significant amount of talks, topics, discussion and thus progress." The argument is that, despite only 10% of proposals coming from women, an anonymized selection process disproportionately favored female speakers.

Under a more traditional selection process, these women's valuable and apparently superior contributions would have been ignored. Note how they ignore the ratio of men and women in the industry, and assume this would not affect the gender ratio of good candidates: 50% of them are assumed to be women. That's not how statistics work.

The results: "Our highest ranked talk is from a woman and we know we wouldn’t have gotten that talk without the outreach we did." And: "We invited 35 women to submit to the [Call For Proposals], of these 13 ended up submitting one or more proposals, 5 women submitted on their own."

So basically, there is a significant amount of pre-selection going on here. In their outreach to female candidates, organizers naturally prefer women who they already think will make good speakers. These candidates then further self-selected based on their own confidence and skill. Less than half of female speakers submitted on their own. Meanwhile, the 162 proposals from men came from the usual pool, requiring no unique outreach. Despite extolling the virtues of anonymized selection, the process was biased to favor talented women from the get go, and it's no surprise women sent in better proposals as a group.

Given the rates of commercial and open source tech participation for women, getting 25% female speakers is a high number, assuming fair random sampling, beating the odds. But it's not random at all. The cure for sexism is apparently... more special treatment for women?

It also bothers me on a personal level: I'm gay, and feel equally excluded when someone puts a picture of Natalie Portman in their JavaScript talk. But even if I wasn't, who's to assume my opinions on the matter would fall in line with the cliché? When people do diversity spot checks of speaker panels and rally the horde, I get counted as just another dude propagating patriarchy and heteronormativeness. What does it tell you when the first thought upon seeing a lone woman in a line-up is token female rather than trailblazer?

Now, I'm not against setting a good example, and I realize the perception of a boy's club can be a barrier to entry. However that shouldn't distract us from what equality of opportunity actually looks like. In tech, it's nowhere near a 50 / 50 gender split, because the imbalance starts much earlier, with more men than women going into STEM fields, despite the fact that 3 women now graduate for every 2 men.

Can we at least give women the benefit of the doubt and assume that they go after what interests them, rather than being unable to choose differently? Even in the most gender-equal country in the world, Norway, STEM fields are still male dominated and the social sector remains female dominated, despite decades of fervent pro-equality policy and education.

How solid and gender-neutral is the research that traces this all back to social pressure? The 2010 documentary Hjernevask (Brainwash) provided a very revealing answer to this question and others, causing a stir in the Norwegian academic community. I highly recommend watching it. I found the resemblance to creationism and intelligent design striking: supposed scientists were dismissing observations out of hand because of perceived implications, questioning the author's motives instead. But sexual dimorphism doesn't imply patriarchy, any more than evolution implies social darwinism.

Some choice facts from honest nature vs. nurture research: even day-old babies show a measurable difference in interest between boys and girls, when presented with both a mechanical toy and a human face. Genetically identical twins have similar IQs and depression rates and research with adopted children shows a similar relation to their biological parents, much more than their adoptive ones. This is no reason to treat individuals any different, but some averages differ innately across gender lines, and I don't see that as something we can or should fix by overcompensating.

Breaking Out of The Filter Bubble

Above all, there's a common thread I can't ignore. The women I admire and respect in tech did so primarily on their own merit, letting nobody speak for them but themselves. Like the men I look up to, they point people to their accomplishments, not their likeability. Their Twitter bios don't consist of one ism after another, showing their adherence to a pre-approved set of beliefs. They don't let random trolls derail them, and they don't find themselves at the center of fires of their own making, expecting others to put them out.

It's also the ideal I aim for. When a couple thousand people on YouTube told me I had no life, I laughed my ass off at the absurdity. I'd just created an accidental experiment in viral media, and learned tons in the process. Meanwhile they just watched a video they apparently didn't like, and then wasted more of their time to point this out. They weren't talking about me, they were talking about themselves.

When people told me I killed Unix, that I should be shot, and that I was just some idiot designer who didn't understand code, I didn't have the privilege to retweet the offense and let my posse roll in. I could only ignore it, taking the reputation hit, or refute the misconceptions with arguments and insight, changing people's minds one post at a time. The arrogant Unix greybeards who bugged me in private? Simple: you bait them into telling you everything they know, pan for gold amongst the mud, and move on. One person against the might of Twitter, HackerNews and Reddit: it's really not so bad, just don't take it too seriously. Once the novelty wears off, the bystander effect kicks in, unless you keep stoking the fires yourself.

Of course, I did let it inform my choices: I stopped working on that project in public, realizing I wasn't going to get much useful participation until much later, and I could do without the distraction. But it no longer bothers me, it's just one in a long line of useful experiments. The lingering frustration I feel is about people's short sightedness, not bruised ego. Ever since then, I treat the internet like I would a lovable-but-backwards grandparent, who makes racist comments over Christmas dinner. Yes Grandma, it's all the damn commie jews and faggots' fault, now, who wants dessert?

No, I don't feel bad for dropping in those sacrilegious words in there just now. I like to think you are mature enough to let those letters pass under your eyes, without burning me at the stake because it reminds you of something unpleasant. I trust you to focus on the couple thousand words I started with, rather than just two at the end. See, the reason people say the n-word instead of nigger when talking about racism, is that they don't yet realize they too would have owned slaves back then.

When the internet gets its panties in a bunch for the umpteenth time, it's worth asking: where are people getting their information from? The plural of anecdote is not data, after all. Every incident I've heard of lately was massively blown out of proportion. Kony 2012 anyone? Look, finally a cause we can all be equally offended by.

Women are adamant about not being pigeonholed by their gender. I see no reason why we should encourage and celebrate doing it to men. Whether male or female, or any of the shades in between and around, people can have wildly different points of view, and reducing everything to a gender battle is as myopic as pretending no issues exist at all.

The most reasonable people are now afraid to speak their mind. They rightly fear being shamed and harassed by those who scream the loudest of abuse. I've debated writing about this for a while, because I know what a certain part of the response will be. But I'm not the only one saying it, so I'm doing it here, once, in full length, with honest citations, after discussion with people of experience. Women and men, in case you're wondering. "Good luck" was a common theme.

Remember, I'm not the one trying to make hay out of gender issues, turning them into ad revenue, TV appearances or book sales. In my line of work, we're expected to fix things, not just tell people they're broken in increasingly hyperbolic words.

Don't man the cannons or summon the horde. Instead, go check out the ton of links I just dropped into your lap, listen to what's already been said, and see if you can't hear the sound of a record skipping somewhere in the distance. It's not the one you think it is.

For the future then, something to think about. If I step outside, I can walk a couple blocks in any direction to encounter these.

I've taken the liberty of making them more honest:

Dead Rocks

Audi

This is what we allow advertisers to paste onto our streets, our newspapers, our TV shows. Our brains. And then the media turns around to tell us how everyone's being selfish and insecure, but sexism is to blame.

As a smarter person put it, it's narcissism repackaged as a gender battle.

Don't say it doesn't affect you, not when a picture of dollar bills makes you more reluctant to help someone pick up pencils.

↧

Why Android Hates You

May 22, 2013, 12:00 am

≫ Next: Zero to Sixty in One Second

≪ Previous: Storms and Teacups

Why Android Hates You

Usability, Affordance and Grannies in Vegas

For quite a while, I've been an Android user. After upgrading to a Nexus 4 several months ago, I'm living the Ice Cream Sandwich and Jelly Bean dream, enjoying a beautifully designed user experience, just like Matias intended, right?

Nope.

Android has a problem and no, this is not about fragmentation. I'd like to talk about Android's contempt for its users instead. See, no matter how many pages of tastefully designed User Interface Guidelines the team puts together, Google just can't seem to shake its deep seated belief: that users are the most inconvenient and unreliable part of the cloud, and that getting adequate mediocrity across the board is the best you can do, rather than taking a holistic approach and excelling.

Strong words, for sure. Ever tried to get support from Google? If you've studied how objects are supposed to be designed, owning an Android phone is an exercise in frustration. It's a daily reminder of bad decisions, performed with a stubborn flourish, by someone who maybe doesn't have the skill or desire, but more likely, just isn't given room to do better.

None of what follows are mortal sins, let's be clear on that. But they're missed opportunities in making Android actually pleasant, and they're a last-mile of frustration some of which Steve Jobs would've probably fired people over.

Don't Get Comfortable

Since the dawn of time, my Android phones have claimed that app data and settings will be backed up to the cloud, courtesy of Google. By "data and settings" they mean "except the stuff you actually care about".

It comes as absolutely no surprise that none of them have ever restored the layout and arrangement of icons and widgets on the home screens from the cloud. The only thing that seems to happen is that a bunch of apps get downloaded one after the other, spazzing out your notification bar for up to an hour if the connection is slow, filling up your app drawer while you're trying to put things back. Actually showing a progress of the entire restoration, perhaps estimating when you can expect it to be back to the way it was, is a pipe dream. Backing up to the cloud is done manually through apps, like G+ photo upload, or other third party mechanisms.

After restoring your phone, you can also look forward to repeating all the patronizing tutorials and "Don't stick your fingers in the sockets" warnings of Google Now, Maps, Navigation, Goggles, etc. and certainly not removing "U.S." and "Sports" from your default news subscriptions, because that's what everybody likes.

The fact that nobody at Google seems to consider this a problem is quite telling. But then, so is the default layout you get out of the box, or in this case, after "restoring my settings". Only the wallpaper was saved:

Backup settings

Downloading Twitter

Never lose my stuff
“Save what people took time to create and let them access it from anywhere. Remember settings, personal touches, and creations across phones, tablets, and computers. It makes upgrading the easiest thing in the world.”

All the Google things are lumped into one folder, which includes the amazing labels "Play Music" alongside "Play Movie…", "Play Maga…", "Play Books" and "Play Store". The hesistant newbie has to make the cognitive leap that "Play" is not a verb here—despite the tacky golden rapper headphones—rather than conclude that Google has weird taste in words, or that this is a phone for children.

The action verb they were looking for instead was "Buy Music" and "Buy Movies". It seems they replaced the media player with a shop you can keep your own tunes in. My advice: get rid of the advertisements posing as widgets and use DoubleTwist instead.

Actually presenting functionality is a job left for the App Drawer, paralyzing you with choice by Pokémon. There's two Mails, three Googles, three Messengers, a whole Play-set as well as assorted circles, pins and triangles. You'll find yourself dragging every app you'll use regularly somewhere more convenient. Open App Drawer. Pan. Draaag. Repeat.

Google folder

App Drawer

As an aside, out of the box, don't expect that people can just send you geo coordinates, video messages, meeting invites or office documents. Whenever it's really mattered that I read something on my phone right now, Android's given me more "I don't know what to do with this" errors than even my Windows 98 box back in the day. Clearly this is just a matter of hooking up the right pre-existing bits, and the newer releases are better. But still, it feels like a lot of real world use cases weren't tested, in particular without an internet connection.

Just recently, I couldn't turn off the Satellite layer in Maps because I didn't have data, even though I'd cached the whole city I was in. Bug or fluke? Who knows. And when Google Now showed me flight info for the wrong day, there didn't seem to be a way to correct it. The algorithm knows better than you do, human.

Decide for me but let me have the final say
“Take your best guess and act rather than asking first. Too many choices and decisions make people unhappy. Just in case you get it wrong, allow for 'undo'.”

Don't Touch Me

As late as the Nexus One, most Android phones weren't really multi-touch. They had separate X/Y sensors, capable of distinguishing two touches from one, but unable to tell whether they were rotating clockwise or counter-clockwise. As a result, multi-touch has always been tacked on as an extra: despite now being capable of tracking 5 or even 10 touches, most surfaces still only respond to one.

This all comes down to the concept of affordance: what do our devices suggest we can do with them, and what do they actually allow us to do? For example, based on the design of the handle, doors generally afford either pulling or pushing, giving strong hints as to how to open them. Instead of arguing over whether things should look skeuomorphic, how about making them act skeuomorphic?

Take for example the notification drawer. One way people often interact with real drawers is to pull them out with one hand, and reach in with the other. You could try doing this with the notification drawer: pull down with one finger, touch or swipe a notification while holding the drawer half open. It doesn't work: you can only interact with one thing at a time. It's not a drawer, it's a hand-operated modal dialog in disguise. This is obviously just a detail, but notable none the less.

More importantly, some notifications can't be dismissed despite looking identical. While responding to swipe gestures, they slide back with pixel-perfect inertia, glued in place with invisible chewing gum. If you're out of cell range, or e.g. simply do not wish to incur roaming costs, the "New Voicemail" message will not go away until you call in, even if it was probably VoIP spam.

The drawer is multi-functional: dragging down with two fingers brings up a dashboard of tiles. Unfortunately, this is a hard gesture to get right 100% of the time. Hence, it appears a clever Android programmer made it so you can add the second finger in the middle of the gesture. Convenient, but they messed it up, suggesting it's an accident: you have to land exactly on the gripper, instead of anywhere in the vicinity. This gesture is even harder than placing both fingers correctly in the first place.

These tiles are another boondoggle: can you tell which ones you can long-press to toggle and which ones you can't? By the way, that long-press has its own non-standard delayed acknowledgement, ensuring you'll accidentally toggle things back and forth a couple of times before you get used to it. Just how much less functional is this than a real slider, some big switches and a few separate icons?

This Flat UI theme has brought with it tons of unlabeled icon buttons. In some places you can long-press on an icon to get a tooltip describing it. Here too the interaction is cumbersome, requiring a long-press for every individual button rather than being able to slide your finger from icon to icon. The natural gesture of discovering a surface by feeling it has been reduced to a mechnical process of tap-hold-repeat. Heck, why not make use of the giant screen, blow up the icons, show tooltips with arrows to everything around your finger, do something more sophisticated than what Microsoft Office did.

Notification Shade

Tooltip

If it looks the same, it should act the same
“Help people discern functional differences by making them visually distinct rather than subtle. Avoid modes, which are places that look similar but act differently on the same input.”

Give me tricks that work everywhere
“People feel great when they figure things out for themselves. Make your app easier to learn by leveraging visual patterns and muscle memory from other Android apps. For example, the swipe gesture may be a good navigational shortcut.”

Then there's the new Gesture-typing keyboard. If your nails are a bit long, or your hands a bit sweaty, or you're wearing thin gloves, you may find it difficult to form basic words. As soon as your finger leaves the pad for a fraction of a second, your gesture is interrupted, spawning a bunch of syllables. Even if you continue swiping and the line continues cleanly. It seems the physics of how fast humans can move their fingers are ignored here in favor of blind sensor obedience. If a gripped finger strays into the pad from the side, it'll stop working altogether. Oh and don't try to type the word "jerking", it's not allowed. What do you think this is, a real keyboard?

Finally if you use Google Talk, you get to enjoy a strange bug where it randomly gets stuck in a loop and forcibly scrolls to the bottom every couple of seconds, at random intervals, resulting in the most frustrating user-computer tug of war I've encountered. Fun times.

Gesture Typing

It's For Your Own Good

Now, everyone knows headphones can damage your hearing. This is why the EU passed laws to help ensure consumers would be protected from themselves: devices like iPods are limited in the sound level they can emit. If you've ever sat next to someone and listened to the kacrunch-kacrunch that their hip-hop or dubstep sounded like to everyone on the outside, you may think this is a very good idea.

Unfortunately, it takes a politician to think you could legislate against people being stupid, and a lawyer to come up with the convoluted verbiage that makes it seem sane to the rest of us. Then you need a developer to do a really hare-brained job in implementing it. Hence, in line with EU legislation, Android 4.2 introduces a new feature, for all users everywhere:

My spoon is too big

Only interrupt me if it's important
“Like a good personal assistant, shield people from unimportant minutiae. People want to stay focused, and unless it's critical and time-sensitive, an interruption can be taxing and frustrating.”

Get to know me
“Learn peoples' preferences over time. Rather than asking them to make the same choices over and over, place previous choices within easy reach.”

This pops up whenever the device thinks you're raising the volume above a safe level. I say 'thinks', because reports suggest the behavior differs based on what the phone is plugged into, with some users never seeing it, and others complaining about it endlessly. In any case, it's clearly broken for two reasons.

One, it is ridiculous to think a phone could know whether the volume is unsafe: it could be plugged into a speaker dock, an amp or a recording device. Some headphones also have an additional volume control on the cord, and regardless, the amount of sonic power transferred depends on the design of the headphones and whether they fit the person's ears well.

Two, the feature is designed in the most user-hostile manner. If the screen happens to be off or out of sight when you push the volume above 50%, you won't notice anything except that your button stops working until you wake up the screen and press "Ok". Worse, after a certain amount of play time has passed, the volume will automatically be kicked back to 50% and the pop-up will return, for that's what the law apparently requires. There's no way to disable this 'feature' permanently. Volume regulation on Android is broken from a user's point of view.

Ridiculousness of law aside, there are any number of ways this could've been done more elegantly. An audible signal could be played along with the message, to provide feedback rather than failing quietly. Pressing volume up again (or twice more) should be interpreted as consent and dismiss the message. Users should be able to permanently disable the warning, for any situation where their hearing is not at risk, or for anyone who understands that banning smoking in parks doesn't make car exhaust any less toxic. And if the lawyers tell you to annoy the hell out of your users, only do it when you are absolutely sure it's required, not across the board, to everyone, with one of the most unimaginative implementations ever.

Oh and notice how it charmingly blacked out the copy of Transformers that Play Movies was playing in the background when taking a screenshot. Google Play clearly excludes sharing, do keep that in mind.

The Wobbly Camera

Moving on, Android 4.2 includes a completely redesigned camera app, with fun features like Photo Sphere and a more minimal UI. It seems pretty slick and follows the crisp white-on-black style of Jelly Bean. However within 10 seconds of using it, it will eagerly show you how stupid it is:

Make important things fast
“Not all actions are equal. Decide what's most important in your app and make it easy to find and fast to use, like the shutter button in a camera, or the pause button in a music player.”

Delight me in surprising ways
“A beautiful surface, a carefully-placed animation, or a well-timed sound effect is a joy to experience. Subtle effects contribute to a feeling of effortlessness and a sense that a powerful force is at hand.”

When you rotate the device from portrait to landscape, it freezes the screen so it can do a completely useless transition. The image doesn't change, the widgets don't actually move anywhere, the controls just rotate out and back in again, and the focus ring trips out for a bit.

Somewhere, a Google developer probably realized this, thought about it, and said "Fuck it, I'll use com.google.android.paint.by.numbers" instead of handling the odd upside down case elegantly.

Switching a camera from portrait to landscape is one of its main affordances. Android manages to jank even this up. To make matters worse, this minimal UI misses the mark: there is only one thing you can do with just one tap, and that's taking a picture. You can argue this makes the camera easier to handle, but I bet there's more than one Nexus 4 user who's never used the front camera, something more important than adjusting the white balance.

There's a button for toggling recording mode and a radial menu for everything else, but no labels or hints in sight, not even on confirmation. Meanwhile switching between the pre-set "scene modes" requires 4 taps and a buried menu, which either says "we forgot to include this in our wireframes" or "nobody really knows what this is for anyway".

In the age of HDR imaging and live color filters, is there much reason to model a virtual camera after the confusing set-and-pray buttons of their physical counterparts, many of which only mimic film cameras to humor you?

The Instagram-like filters that you can apply afterwards are functional, but the UI for applying effects and transforms is its own kind of weird, lacking useful feedback in several places. It also requires you to flick an invisible mirror to apply a mirroring effect, after pressing the Mirror button, I shit you not.

The only thing that's genuinely clever about it though is how the camera acts like the newest picture frame in your gallery view, here on the left:

However even this view is hard to use, because it uses the most uncomfortable form of inertial scrolling and snap-to-item seen on the platform yet. It moves like a slot machine wheel, flipping over at the very end. Dear UI programmers: please go learn about easing and the equations of motion, you really can't arse your way around these things anymore, unless the demographic you were aiming for was grannies in Vegas.

Real objects are more fun than buttons and menus
“Allow people to directly touch and manipulate objects in your app. It reduces the cognitive effort needed to perform a task while making it more emotionally satisfying..”

network selection

The Network Cocktease

But the most amazing example has to be the Network Operator screen. This dialog, normally buried, is presented when you lose connectivity. It's not just completely dysfunctional, but has remained unchanged for years. We're talking about the feature that turns your phone from a mostly useless brick into a node on the global voice and data network. It's kind of crucial when you need it.

If you're here, it means something has changed. Either you've travelled to a foreign country, you've just spent a noticeable time out of cell range, something's wrong on your operator's end, or the software messed up. Your goal is simple: to confirm to the phone which network you'd like to use and start to reconnect right now, so you can get back to business.

Unlike Wi-Fi, scanning for mobile networks can take a while. I've tried to dig into the relevant specs to see exactly why this is, and there doesn't seem to be a clear reason for it: networks advertise themselves, cell phones simply listen in, tuning to various channels to do so. But I may have missed something, given that 3GPP specs make W3C specs look like florid prose.

Most users just want the phone to choose the right network automatically, selecting their home network if it's available, or roaming on a preferred partner. This is done using a list stored in the SIM card, and hence all known ahead of time. So what went wrong?

Android Network Selection 2

First, the obvious one: despite Android's emphasis on letting background processes and services run freely, scanning for network operators is a modal process. Scanning starts as soon as this screen is opened, and the only way to keep it open is to not touch anything. You have to wait until the phone finds all results to select one. Which means the 99% of users who wish to "Choose Automatically" have to wait for no reason.

But suppose you're travelling and you've set your display timer to a modest 15 seconds to conserve battery. Now the screen will shut off before scanning has completed. You'll want to just let the phone sit for a bit and then pick it up again, right?

Wrong. See, scanning completed ages ago. But because you weren't here to watch the phone do it, you can't actually select any of these options, because the modal popup will now never go away. Tapping it does nothing. Tapping in the dimmed area around it instantly closes the whole screen. If you try to return, the whole scanning process starts over again, because the results weren't even cached for 5 seconds.

This is Utterly Broken. Users go here because they want to pick an option, so it should maximize their opportunity for doing so, not do the opposite and fail completely. If there is a preferred provider available, it should indicate that it knows this, not just suggestively put it at the top of the list while being coy about it.

It's also a classic example of Google-Don't-Care: the Nexus One had this screen, the Nexus 4 still has it. Because most people only use this in unusual circumstances, it hasn't been on anyone's radar to merit fixing, despite several major OS revisions. But when you're walking around a foreign city, running late and needing to call someone, few things will make you want to toss your phone against a wall more than this.

It's not my fault
“Be gentle in how you prompt people to make corrections. They want to feel smart when they use your app. If something goes wrong, give clear recovery instructions but spare them the technical details. If you can fix it behind the scenes, even better.”

I should always know where I am
“Give people confidence that they know their way around. Make places in your app look distinct and use transitions to show relationships among screens. Provide feedback on tasks in progress.”

android clock: good

android clock: bad

my phone

The Times They Are A-Changing

Level such criticisms at any good Android fanboy though, and you'll hear back a common refrain: "Yes, but, things are changing! It takes time to change a developer culture."

That would be valid, if some of the worst offenders weren't completely new in the latest upgrade. You can't set a good example with one mediocre experiment after another.

The other rebuttal is, "There's a replacement app for that, that's the beauty of Android." And indeed, I do use a custom launcher and root my phone. But the result is still a series of apps, not a mobile desktop.

The way I see it, there are basically four tiers of Android apps right now.

First, there's all the iOS apps that were ported over with no regard for the platform, which just feel like second-rate versions of better things. Luckily these are being replaced with native look-and-feel, with Twitter being a recent example.

Second, there's the legacy of Android 2.x, whose widgets and pop ups still show up with alarming regularity, serving as a constant reminder to stay away from the Play Store as much as possible.

Third, there's the mismatched Google apps from the No-Man's Land of Inbetween, like Play, Plus, Maps, Goggles, Reader and YouTube, which each have their own spin on the flatter UI with varying levels of blandness vs sophistication.

Finally there's the modern Roboto Thin style of Jelly Bean, which oscillates strangely between a Braun-esque brilliance and a confusing form-over-function Sci-Fi UI not meant to be seen or used up close.

There's nothing to tie it together but a bunch of disjoint cloud services. Are we all going to stay in our isolated messaging silos? Keep on emailing each other DropBox URLs? Substite mediocre AI for fine-grained control and precision?

As an end-user, Android feels like it's a glimpse of something great now, but there are too many forces pulling it in different directions. If there's someone whose job it is to curate the Android experience and make it excel, their vision is not being realized effectively. Too much manual assembly is still required.

The past decade, we've seen what was once just Apple's domain become mainstream, that is, bringing high-end UX design to software. But I can't help but feel some of the finesse has been lost. Gripes about iOS and OS X suggest the same is true for Apple as well.

You hear a lot about "first time user experience" for example. But it's not about wrapping up your product like a present. It's about creating a connection of trust through empowerment and a little bit of emotional appeal: "This is for you, you can do amazing things with this." And that means "first-time" shouldn't refer to the first time you turn on the device, but the first time you use a device for a particular purpose and context. Travelling to Another Country should definitely be treated as a "first time" experience, same with How Do I Work This Camera, I Don't Have an App For This, I Don't Have Data Right Now, I Dropped It Down The Stairs, I Should've Cached This Map But I Didn't, My Friend Has a Windows Phone, etc. Throwing in more obnoxious tutorials is not the answer, creating affordance is.

One of the most brilliant things Apple did for the iPhone was to make screenshots easy. People could show what they were doing with it, it empowered them to do so, popping up in articles and feeds. It took until Android 4 before you could do that without a USB debugger, so the only screenshots we got were taken by developers. Which platform has which image today? The iPhone isn't actually that intuitive, but most people already knew all about swiping and pinching by the time they picked one up for the first time. But the Android experience cannot come into its own when it's only chasing the walled gardens that already exist. It lacks agency for the user and pretends the cloud can pick up the slack.

The most annoying part is, the worst bugs are well known, and the war over services is obvious. The issue tracker has long been full of reports and me-too-s and dear-god-when-will-this-be-fixed. Owning an Android means you accept having something that's always a bit broken, which doesn't integrate as well with anything you do as an iPhone does with its family, and just substitutes Google for Apple more and more. I'll stay on this side, I like my phone to be hackable, but that means being able to shut off the stupid warnings too, and having an OS actually worth hacking for.

If the insides of this phone were as thoughtfully put together as the outside, Android could be to mobile what OS X was to the desktop in the 2000s. But so far, no dice.

Above all, it's true what Steve Jobs said: Android is a stolen product. After so many years, there is little about the platform that can be said to set it apart from direct competitors. It's now just a mostly well done version of the same thing. I'm one of those old fogeys who has resisted getting a tablet, simply because I don't want a gianter phone. I want a touch computer, with everything that implies, not just something to do email and watch movies on.

Quotes from: Android – Design Principles, Google Inc. (cc)

↧