The Stanford Multi-Camera Array


The 128 video cameras in our array can be arranged and used in a variety of ways. In the image above, the cameras are arranged 2 inches apart and aimed so that their fields of view overlap completely about 10 feet from the array. This arrangement simulates a single camera with an aperture 3 feet wide, allowing us to see through partly occluding environments like foliage and crowds.	Here the cameras are packed 1 inch apart and their triggering times are staggered within the normal 1/30 second video frame interval. This creates a virtual high-speed camera - up to 3,000 frames per second.	Here telephoto lenses are used, and the cameras are splayed outwards, creating a video camera of very high resolution - up to 6,000 pixels wide. Each camera adjusts its exposure independently, allowing us to record high dynamic range environments.	Finally, we arrange the cameras in an arc by mounting them on separate panels, and we aim them at a point in the middle of the room. The spacing between cameras is 9 inches. In this configuration, the cameras capture a video light field, which can be used for view interpolation or shape estimation. Visible below the central panel are the 4 capture PCs. Visible below the other 4 panels are racks containing supporting electronics boards - one per camera.

Overview

The advent of inexpensive digital image sensors has generated great interest in building sensing systems that incorporate large numbers of cameras. At the same time, advances in semiconductor technology have made increasing computing power available for decreasing cost, power, and package size. These trends raise the question - can we use clusters of inexpensive imagers and processors to create virtual cameras that outperform real ones? Can we combine large numbers of conventional images computationally to produce new kinds of images? In an effort to answer these questions, the Stanford Computer Graphics Laboratory has built an array of 100 CMOS-based cameras. The optics, physical spacing, and arrangement of the cameras is reconfigurable. Each camera consists of two custom boards: a smaller one containing a VGA-resolution Omnivision CMOS sensor and inexpensive lens, and a larger one containing a Motorola Coldfire microprocessor, a Sony MPEG2 video encoder, a Xilinx Field Programmable Gate Array (FPGA), and a Texas Instruments IEEE1394 (Firewire) chipset. The system is designed to return live, synchronized, slightly compressed (8:1 MPEG) video from all 100 cameras at once, and to record these video streams through 4 PCs to a striped disk array.

Multi-camera systems can function in many ways, depending on the arrangement and aiming of the cameras. Some of the arrangements we have tried are pictured above. In particular, if the cameras are packed close together, then the system effectively functions as a single-center-of-projection synthetic camera, which we can configure to provide unprecedented performance along one or more imaging dimensions, such as resolution, signal-to-noise ratio, dynamic range, depth of field, frame rate, or spectral sensitivity. If the cameras are placed farther apart, then the system functions as a multiple-center-of-projection camera, and the data it captures is called a light field. Of particular interest to us are novel methods for estimating 3D scene geometry from the dense imagery captured by the array, and novel ways to construct multi-perspective panoramas from light fields, whether captured by this array or not. Finally, if the cameras are placed at an intermediate spacing, then the system functions as a single camera with a large synthetic aperture, which allows us to see through partially occluding environments like foliage or crowds. If we augment the array of cameras with an array of video projectors, we can implement a discrete approximation of confocal microscopy, in which objects not lying on a selected plane become both blurry and dark, effectively disappearing. These techniques, which we explore in our CVPR and SIGGRAPH papers (listed below), have potential application in scientific imaging, remote sensing, underwater photography, surveillance, and cinematic special effects.

Construction of this array was funded by Intel, Sony, and Interval Research, as part of the Stanford Immersive Television Project. Research using the array was funded by the National Science Foundation and DARPA. Applications of camera and projector arrays were one focus of the Spring 2004 and Winter 2006 versions of our Topics in Computer Graphics course (CS 448). Although the array still works, this project is currently inactive.

People:

Andrew Adams < abadams@stanford.edu >
Emilio Antunez < eantunez@stanford.edu >
Billy Chen < billyc@cs.stanford.edu >
Gaurav Garg < ggaurav@graphics.stanford.edu >
Mark Horowitz < horowitz@ee.stanford.edu >
Marc Levoy
Eino-Ville (Eddy) Talvala < talvala@graphics.stanford.edu >
Vaibhav Vaish < vaibhav@graphics.stanford.edu >
Bennett Wilburn < wilburn@leland.stanford.edu >
See also the people working on the Light fields and computational photography, Light Field Microscope, and CityBlock projects

Emeritus:

Katherine Chou < kic@cs.stanford.edu >
Neel Joshi < nsj@cs.stanford.edu >
Hsiao-Heng Kelin Lee < kelin@stanford.edu >
Guillaume Poncin < gponcin@stanford.edu >
Michael Smulski
Monica Goyal < mgoyal@stanford.edu >
Georg Petschnigg < georgp@graphics.stanford.edu >

Recent papers in this area:

Reconstructing Occluded Surfaces using Synthetic Apertures: Stereo, Focus and Robust Measures: Vaibhav Vaish, Richard Szeliski, C.L. Zitnick, Sing Bing Kang, Marc Levoy; Proc. CVPR 2006
High Performance Imaging Using Large Camera Arrays: Bennett Wilburn, Neel Joshi, Vaibhav Vaish, Eino-Ville (Eddy) Talvala, Emilio Antunez, Adam Barth, Andrew Adams, Marc Levoy, Mark Horowitz; Proc. SIGGRAPH 2005
Synthetic Aperture Focusing using a Shear-Warp Factorization of the Viewing Transform: Vaibhav Vaish, Gaurav Garg, Eino-Ville (Eddy) Talvala, Emilio Antunez, Bennett Wilburn, Mark Horowitz, Marc Levoy; Proc. Workshop on Advanced 3D Imaging for Safety and Security (A3DISS) 2005 (in conjunction with CVPR 2005)
High Speed Video Using a Dense Camera Array: Bennett Wilburn, Neel Joshi, Vaibhav Vaish, Marc Levoy, Mark Horowitz; Proc. CVPR 2004
Using Plane + Parallax for Calibrating Dense Camera Arrays: Vaibhav Vaish, Bennett Wilburn, Neel Joshi, Marc Levoy; Proc. CVPR 2004
The Light Field Video Camera: Bennett Wilburn, Michael Smulski, Hsiao-Heng Kelin Lee, and Mark Horowitz; Proc. Media Processors 2002, SPIE Electronic Imaging 2002
See also the papers produced by the Light fields and computational photography, Light Field Microscope, and CityBlock projects

Some older papers:

Light Field Rendering: Marc Levoy and Pat Hanrahan,; Proc. SIGGRAPH '96

Slides, videos, and demos:

(Slides from papers may also be available on the web pages of those papers.)

Marc Levoy
Light field photography and videography
(or PPT or PDF),
University of Virginia, October 18, 2005
15-second video showing synthetic aperture focusing to see through foliage (MPEG, 2.3MB).
The input is an array of snapshots from 45 cameras. Here is one such snapshot.
Click here to view this light field interactively in your browser, using a Flash-based light field viewer.
5-minute video summarizing several applications of the array (MPEG-4, 70MB)
Instructions for running lab demos

Other references:

High Performance Imaging Using Arrys of Inexpensive Cameras: Bennett Wilburn; PhD dissertation, Stanford University, 2004.

Available datasets:

An archive of light fields, several of which were captured using the multi-camera array.

If images on this page look dark to you, see our note about gamma correction.

A list of technical papers, with abstracts and pointers to additional information, is also available. Or you can return to the research projects page or our home page.