The 2015 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) will include a tutorial on Computer Vision for Visual Effects with several expert speakers from industry and academia. The tutorial will take place the day before the main conference begins, from 8:30 AM to 12:00 PM on June 7, 2015 at the Hynes Convention Center in Boston, Massachusetts. See more details at this page.
I’ve been lax on blogging, though I have a few topics in the wings… but in the meantime, I’m beginning to post Youtube videos from my Spring 2014 course at Rensselaer Polytechnic Institute based on the book, targeted at beginning graduate students and advanced undergraduates. Each roughly hour-long video recording covers 1-2 sections of the book and contains my voice over a screen capture of handwritten notes, figures from the book, running code on real images, and webpage/video views. New videos will be added once or twice a week and the complete set will be available in May 2014. Check them out at this page.
Here’s a talk I recently gave that overviews the field of computer vision and its applications to visual effects in movies and television:
The first 24 minutes are a general introduction to computer vision, why it’s difficult, and what kinds of problems computer vision researchers in academia and industry study. The rest of the talk overviews computer vision problems that are encountered in the design and production of visual effects, with lots of stills and videos from movies and TV. The main categories of problems (Matting, Image Compositing and Editing, Features, Dense Correspondence, Matchmoving, Motion Capture, and 3D Acquisition) parallel the chapters in my book.
Thanks to the RPI Cognitive Science Department for hosting and recording the talk.
Matchmoving (also known as camera tracking) is the first step of any visual effects problem in which CGI objects must appear to “live in” three-dimensional space. Often this involves fully-CGI characters interacting with actual sets or background plates (e.g., Optimus Prime battling in Chicago, Dobby in Harry Potter’s room, the Hulk smashing up New York City). However, there are many more subtle applications of matchmoving; one striking effect that’s been used many times recently is the insertion of 3D text into the credits of movies. Below are two examples from movies that you wouldn’t necessarily associate with visual effects (Easy A and Panic Room):
For Easy A in particular, in addition to matchmoving there was also a fair amount of matting and compositing required; for example, every time a person passes in front of the text, the edges of his/her body had to be outlined or rotoscoped. This was also probably a tricky shot considering its length (although since a single CGI object isn’t continuously present throughout the shot, they might have been able to get away with estimating the camera track in pieces).
One of my favorite uses of this effect is from Stranger Than Fiction, seen below.
Actually, in many of these shots the camera is stationary or purely panning/zooming. In the first case, the compositor could probably eyeball where to put the floating text in 3D and tie it to the motion of a single tracked point (like the end of the toothbrush or a point on Will Ferrell’s body). In the second case (also known as a “nodal pan”) the background pixels in any two frames are related by a projective transformation, so drawn text in one frame of a shot can be pushed to other frames of the shot. However, if the camera’s moving, even a little bit, matchmoving is required.
Once you start thinking about this effect, you’ll notice it in many places. Other examples include Zombieland, Watchmen, Scott Pilgrim, and Fringe. Even local law firm commercials! The same idea is also involved with inserting CGi objects into sports broadcasting; see the earlier post on visual effects for the Olympics.
(Not entirely related, but Art of the Title is a pretty cool site.)
Movies are often filmed on location, though this can be a complex and expensive process; streets need to be blocked off, permits need to be acquired, and so on. TV shows often can’t afford outdoor location filming, in terms of both time and money. On the other hand, there are outdoor shots in lots of TV shows; how do they do it? The answer is the extensive use of blue and green screens, which are replaced in post-production by realistic backgrounds. This is sometimes called the “virtual backlot”, as illustrated in this great demo reel from Stargate Studios:
Most of these blue/greenscreen effects are imperceptible to the viewer: everyday shots like two characters walking down a city street, or a character talking on their cell phone in front of a city skyline. Most of these shots are from TV shows that aren’t associated with flashy effects, like medical shows, law-and-order procedurals, and family comedies. I was especially impressed by the clip from Ugly Betty starting at about 2:12; hardly anything in this scene was “real”.
Here’s a longer look at the effects Stargate did for ABC’s Revenge, a lot of which takes place in houses near the ocean. In many cases, the camera isn’t moving much, which makes the problem easier, but there are a couple shots that follow characters as they walk around a wrap-around porch that I thought were particularly impressive, starting at about 1:44 and 2:14:
In this case, some matchmoving is probably involved, as opposed to the pan/tilt shots where one can get away with different views of a spherical panorama. Keep in mind that these effects need to be turned around by the VFX company in a week (or less), so there isn’t that much time to polish the tiniest details like wisps of hair.
Post Magazine has a great article on the types of visual effects involved in last season’s new TV shows — not just bluescreens for background replacement but more advanced work like changing the season of a shot or adding CGI creatures.
High-quality facial motion capture for filmmaking (e.g., Rise of the Planet of the Apes, Avatar, TRON: Legacy) is usually done with a combination of visible marker dots and a head-mounted rig (on-set) and the MOVA Contour system of phosphorescent makeup (off-set). There’ll be a longer blog post on this later, but the video below from Digital Domain on the de-aging effect of Jeff Bridges in TRON: Legacy illustrates the idea.
However, video game developer Team Bondi took a different approach for their 2011 video game L.A. Noire. They created a custom multi-view stereo environment, pictured below, to capture the 3D face and hair of a large number of performers, which was later compressed and streamed directly into the game.
I started playing it last night and the effect is really striking! The below video explains the process in more detail with many examples from the game. The technology, called MotionScan, was created by a company called Depth Analysis. Unfortunately, Team Bondi is no longer around and it remains to be seen whether this approach will resurface in a new game or movie.