Before we begin, let’s review the basics of an AR application that enhances the image by one or more additional 3D objects. Several components are necessary to get going, so lets just make a list of everything:
- Display - This is an easy one. We need some kind of display through which we perceive our newly enhanced world. This can be a mobile device such as a tablet, a smartphone or a HMD such as this Oculus Rift mod with additional cameras, or for instance one of these fancy but expensive MR HMDs from Canon. Each of these devices need a frontal camera with which the scene is recorded, preferably one with a high resolution and low image artifacts such as noise. Some of these devices can even come with two cameras to directly provide a 3D stereo effect.
- Tracking algorithm - Putting objects into real space requires geometric registration, that is if the camera moves, the virtual object has to stay at its position in the real space. There’s a myriad of tracking algorithms ranging from simple marker “tracking” (in reality it’s just constant initialization of a tracker, but they are very neat for debugging purposes) to algorithms like line or feature trackers, possibly enhanced by additional gyroscopic sensor to help fight any lag or drift. For starters, ARToolKit is an old but open framework for simple marker tracking. If you want to get started, this is easy to set up. PTAM and others can later be used for much more stable solutions. Since in this blog I will mostly cover rendering techniques, I’ll rely on ARToolKit for most of the time.
- Reconstruction - This one is actually many things. To have our virtual object interact with the real world, we need a reconstruction of the real world. So for instance, if our virtual object will drop a shadow from a real light source we’ve reconstructed, we also need to know where the shadow will be visible on. Depending on the work done in this part is how dynamic the application can react to changes in the scene, such as someone switching the light on and off. I will get to this point in detail below. In any case, the minimum input for a renderer is a bunch of light sources, so we need to reconstruct the lighting configuration of the scene, and the surfaces/geometry with which to interact with.
- Renderer - The final fragment is the one which creates the pixels. This piece of software will render an object, mask out real and virtual parts of the new composition, render light interaction, shadows, interreflections etc. and glue it all together into one frame. All previous parts come together in this particular piece of code, and here we have to take care that reality and virtuality will match up.
Having these parts is the minimum input to start rendering something nice. Additional hardware can however make the entire system more dynamic. This is of course limited by how mobile your AR system should end up. Bullet number 3, the reconstruction of reality, is a rather complex issue which can be heavily supported by additional hardware. Let’s have a look at some of them:
- Depth sensors provide the ability to record not just the incident radiance at each point in the scene from a camera, but also it’s distance, which helps to distinguish between objects nearer to the camera than others. A virtual object then isn’t always drawn over the background image, but can end up behind a real one. There are several depth cameras available with different methods of sensing distance: time of flight for sound and ultraviolet light can be recorded to figure out the distance of an object to the camera. The Microsoft Kinect uses a fixed pattern which is also measured for deformations.
- Furthermore, depth sensors can also be used to reconstruct scene geometry. One of the more famous examples of this is Microsoft’s KinectFusion, which is available in the KinectSDK. Of course one has to be aware that a depth sensor can only be used - in a single frame - to reconstruct the front-side of things.
- Light probes are sensors which capture real world illumination of the scene. The “low-end version” of this used to be a simple shiny ball (still used quite often for movie production), which gives you some sort of environment map of the real world. Another idea is to place an additional camera somewhere with a nice wide angle lens to capture a hemispherical image. A couple of those cameras which can record full HDR HD videos at 30 FPS can be acquired from Point Grey, such as the Ladybug series. These can greatly help to figure out dynamically where light comes from!
- Other sensors directly integrated into switches (for instance light switches) can also be used. Usually one will find many of those in ambient assisted living areas, but these are still rather exotic. I won’t go into details here.
The take-away message is this: a great relighting solution can get even better if we add a little bang to it with some neat active sensors. The renderer can survive with fewer assumptions about reality, which is always a good thing! Overdoing this however can impact the mobility and flexibility of the overall system.