If you’ve been watching television coverage of the London 2012 Olympics, you’ve probably seen plenty of impressive visual effects. The underlying technology for generating these effects is quite similar to that used for film and television production, with the caveat that all the effects have to be generated in real-time (or quickly enough to be shown in an instant replay).
The most common example, present in the coverage of almost every Olympic sport, is a motion graphic superimposed on the raw video that labels the lanes that runners or swimmers are in, or highlights what line a competitor has to beat in order to win a race. These types of effects have been around for a while, and are created using feature tracking and well-calibrated cameras that know their relationship to the 3D plane where the graphics should go (e.g., the track or pool surface).
There’s a cool, more advanced effect being deployed in diving and gymnastics coverage in which the camera seems to swivel around the athlete in 3D as they’re frozen in mid-air (a la The Matrix). This is done with a combination of real-time foreground segmentation from multiple cameras, a fast multiview stereo algorithm, and a version of production visual effects software like the Foundry’s Nuke. The underlying approach is clearly explained in this video about the i3DLive system developed by the Foundry, the University of Surrey, and BBC Research and Development. Sorry I couldn’t figure out how to embed it here, but it’s really worth a look.
Chapter 2 of the book is all about Image Matting, the separation of a natural image into foreground and background elements. It’s not quite like putting a jigsaw puzzle together, since the “pieces” are fuzzy (e.g., background partially shows through an actor’s wispy hair). The matting problem gets its name from the way scenes in old-school Hollywood movies were created; expert artists would create large, detailed paintings on panes of glass placed between the camera and the set. The result would be that live action fused (hopefully) seamlessly with the matte. The image above is a classic shot from Raiders of the Lost Ark, where you can see the gray region is the clear part of glass through which the scene was shot. As you can imagine, matching the perspective and lighting of the live action is very tricky!
You can find many pictures of classical matte paintings online- for example, see the great list at Shadowlocked. However, I had a hard time finding a good picture showing a glass painting in-line with the camera path to produce a composite. The best I could come up with was this example from a 90’s miniseries called The Last Days of Pompeii, in which the volcano is painted on glass at the upper left and you can see how the painting and real scene line up:
It’s become increasingly easy for the average person to create 3D models of objects simply by taking lots of images. This problem is also known as multiview stereo, and many ways to approach it are discussed in Section 8.3 of the book.
Recently a team of volunteers went to the Metropolitan Museum of Art to acquire lots of pictures of classical sculptures, which were then processed into 3D models using Autodesk’s 123D Catch software. This free multiview stereo software makes it really easy to make your own 3D models. The multiview stereo algorithms under the hood are from acute3D, a French company that had a great presentation at CVPR 2012.
Photo manipulation didn’t start with Photoshop! Several famous historical photos were actually manually altered.
For example, this image of General Ulysses S. Grant from the mid-1800s was actually constructed from three source images taken in very different places at different times. General Grant’s head was taken from one image, the body and horse from a different person in another image, and the background from an entirely different scene. Section 3.3 of the book addresses automatic ways to solve this problem.
This image of William Lyon Mackenzie with Queen Elizabeth from 1939 is an early example of manually inpainting a large hole with complex texture. King George VI was fully removed from the picture! Section 3.4 of the book addresses automatic ways to solve this problem.
This picture, created by Art Streiber for Vanity Fair, celebrates the 100th anniversary of Paramount Pictures (click for big version).
This is a great example of photomontage, the kind of effect discussed in this Siggraph 2004 paper by Agarwala et al. and in Section 3.3 of my book. There’s no way you’d get 116 people (much less high-power movie stars) spread out across a 60-foot-wide stage all facing forward and smiling at the same time. Instead, I’m pretty sure that the final result is composed of tens of pieces of photos, seamlessly merged together. (In fact, there’s no reason everyone had to be there at the same time, and they probably weren’t.)
You can make similar results yourself using the free code by Agarwala et al. at the above link.
Modern blockbuster movies seamlessly introduce impossible characters and action into real-world settings using digital visual effects. These effects are made possible by research from the field of computer vision, the study of how to automatically understand images. Computer Vision for Visual Effects will educate students, engineers, and researchers about the fundamental computer vision principles and state-of-the-art algorithms used to create cutting-edge visual effects for movies and television.
The book describes classical computer vision algorithms used on a regular basis in Hollywood (such as blue-screen matting, structure from motion, optical flow, and feature tracking) and exciting recent developments that form the basis for future effects (such as natural image matting, multi-image compositing, image retargeting, and view synthesis). It also discusses the technologies behind motion capture and three-dimensional data acquisition. More than 200 original images demonstrating principles, algorithms, and results, along with in-depth interviews with Hollywood visual effects artists, tie the mathematical concepts to real-world filmmaking.
Computer Vision for Visual Effects will be published by Cambridge University Press in Fall 2012. Watch this space for more details, blog posts on new visual effects algorithms, and cool demo reels!