Here’s a talk I recently gave that overviews the field of computer vision and its applications to visual effects in movies and television:
The first 24 minutes are a general introduction to computer vision, why it’s difficult, and what kinds of problems computer vision researchers in academia and industry study. The rest of the talk overviews computer vision problems that are encountered in the design and production of visual effects, with lots of stills and videos from movies and TV. The main categories of problems (Matting, Image Compositing and Editing, Features, Dense Correspondence, Matchmoving, Motion Capture, and 3D Acquisition) parallel the chapters in my book.
Thanks to the RPI Cognitive Science Department for hosting and recording the talk.
Matchmoving (also known as camera tracking) is the first step of any visual effects problem in which CGI objects must appear to “live in” three-dimensional space. Often this involves fully-CGI characters interacting with actual sets or background plates (e.g., Optimus Prime battling in Chicago, Dobby in Harry Potter’s room, the Hulk smashing up New York City). However, there are many more subtle applications of matchmoving; one striking effect that’s been used many times recently is the insertion of 3D text into the credits of movies. Below are two examples from movies that you wouldn’t necessarily associate with visual effects (Easy A and Panic Room):
For Easy A in particular, in addition to matchmoving there was also a fair amount of matting and compositing required; for example, every time a person passes in front of the text, the edges of his/her body had to be outlined or rotoscoped. This was also probably a tricky shot considering its length (although since a single CGI object isn’t continuously present throughout the shot, they might have been able to get away with estimating the camera track in pieces).
One of my favorite uses of this effect is from Stranger Than Fiction, seen below.
Actually, in many of these shots the camera is stationary or purely panning/zooming. In the first case, the compositor could probably eyeball where to put the floating text in 3D and tie it to the motion of a single tracked point (like the end of the toothbrush or a point on Will Ferrell’s body). In the second case (also known as a “nodal pan”) the background pixels in any two frames are related by a projective transformation, so drawn text in one frame of a shot can be pushed to other frames of the shot. However, if the camera’s moving, even a little bit, matchmoving is required.
Once you start thinking about this effect, you’ll notice it in many places. Other examples include Zombieland, Watchmen, Scott Pilgrim, and Fringe. Even local law firm commercials! The same idea is also involved with inserting CGi objects into sports broadcasting; see the earlier post on visual effects for the Olympics.
(Not entirely related, but Art of the Title is a pretty cool site.)
Movies are often filmed on location, though this can be a complex and expensive process; streets need to be blocked off, permits need to be acquired, and so on. TV shows often can’t afford outdoor location filming, in terms of both time and money. On the other hand, there are outdoor shots in lots of TV shows; how do they do it? The answer is the extensive use of blue and green screens, which are replaced in post-production by realistic backgrounds. This is sometimes called the “virtual backlot”, as illustrated in this great demo reel from Stargate Studios:
Most of these blue/greenscreen effects are imperceptible to the viewer: everyday shots like two characters walking down a city street, or a character talking on their cell phone in front of a city skyline. Most of these shots are from TV shows that aren’t associated with flashy effects, like medical shows, law-and-order procedurals, and family comedies. I was especially impressed by the clip from Ugly Betty starting at about 2:12; hardly anything in this scene was “real”.
Here’s a longer look at the effects Stargate did for ABC’s Revenge, a lot of which takes place in houses near the ocean. In many cases, the camera isn’t moving much, which makes the problem easier, but there are a couple shots that follow characters as they walk around a wrap-around porch that I thought were particularly impressive, starting at about 1:44 and 2:14:
In this case, some matchmoving is probably involved, as opposed to the pan/tilt shots where one can get away with different views of a spherical panorama. Keep in mind that these effects need to be turned around by the VFX company in a week (or less), so there isn’t that much time to polish the tiniest details like wisps of hair.
Post Magazine has a great article on the types of visual effects involved in last season’s new TV shows — not just bluescreens for background replacement but more advanced work like changing the season of a shot or adding CGI creatures.
High-quality facial motion capture for filmmaking (e.g., Rise of the Planet of the Apes, Avatar, TRON: Legacy) is usually done with a combination of visible marker dots and a head-mounted rig (on-set) and the MOVA Contour system of phosphorescent makeup (off-set). There’ll be a longer blog post on this later, but the video below from Digital Domain on the de-aging effect of Jeff Bridges in TRON: Legacy illustrates the idea.
However, video game developer Team Bondi took a different approach for their 2011 video game L.A. Noire. They created a custom multi-view stereo environment, pictured below, to capture the 3D face and hair of a large number of performers, which was later compressed and streamed directly into the game.
I started playing it last night and the effect is really striking! The below video explains the process in more detail with many examples from the game. The technology, called MotionScan, was created by a company called Depth Analysis. Unfortunately, Team Bondi is no longer around and it remains to be seen whether this approach will resurface in a new game or movie.
So you’re a computer vision researcher and you think, segmentation with a green background, feature tracking, structure from motion — how hard could creating visual effects be? Try putting your money where your mouth is with these free HD greenscreen videos created by Hollywood Camera Work. The source videos illustrate tough matting problems involving wispy hair, transparent clothing, and motion blur, and well as matchmoving problems at various ranges with different numbers of artificial tracking markers. There’s also a page with free natural-environment tracking videos that provide good practice on feature detection/tracking and matchmoving.
In addition to their intended use for helping VFX artists who are just starting out, these videos would also be a great resource for making homework problems in a course that uses my book. Found via Scott Squires’s great VFX blog.
If you’ve been watching television coverage of the London 2012 Olympics, you’ve probably seen plenty of impressive visual effects. The underlying technology for generating these effects is quite similar to that used for film and television production, with the caveat that all the effects have to be generated in real-time (or quickly enough to be shown in an instant replay).
The most common example, present in the coverage of almost every Olympic sport, is a motion graphic superimposed on the raw video that labels the lanes that runners or swimmers are in, or highlights what line a competitor has to beat in order to win a race. These types of effects have been around for a while, and are created using feature tracking and well-calibrated cameras that know their relationship to the 3D plane where the graphics should go (e.g., the track or pool surface).
There’s a cool, more advanced effect being deployed in diving and gymnastics coverage in which the camera seems to swivel around the athlete in 3D as they’re frozen in mid-air (a la The Matrix). This is done with a combination of real-time foreground segmentation from multiple cameras, a fast multiview stereo algorithm, and a version of production visual effects software like the Foundry’s Nuke. The underlying approach is clearly explained in this video about the i3DLive system developed by the Foundry, the University of Surrey, and BBC Research and Development. Sorry I couldn’t figure out how to embed it here, but it’s really worth a look.
There’s a ton of interesting information on the BBC Research and Development web site. For example, check out these pages on augmented reality athletics, markerless motion capture for biomechanics, and all sorts of “Production Magic” techniques. These are great applications of computer vision for visual effects!