The advent of inexpensive digital image sensors has generated great interest in building sensing systems that incorporate large numbers of cameras. At the same time, advances in semiconductor technology have made increasing computing power available for decreasing cost, power, and package size. These trends raise the question - can we use clusters of inexpensive imagers and processors to create virtual cameras that outperform real ones? Can we combine large numbers of conventional images computationally to produce new kinds of images? In an effort to answer these questions, the Stanford Computer Graphics Laboratory has built an array of 100 CMOS-based cameras. The optics, physical spacing, and arrangement of the cameras is reconfigurable. Each camera consists of two custom boards: a smaller one containing a VGA-resolution Omnivision CMOS sensor and inexpensive lens, and a larger one containing a Motorola Coldfire microprocessor, a Sony MPEG2 video encoder, a Xilinx Field Programmable Gate Array (FPGA), and a Texas Instruments IEEE1394 (Firewire) chipset. The system is designed to return live, synchronized, slightly compressed (8:1 MPEG) video from all 100 cameras at once, and to record these video streams through 4 PCs to a striped disk array.
Multi-camera systems can function in many ways, depending on the arrangement and aiming of the cameras. Some of the arrangements we have tried are pictured above. In particular, if the cameras are packed close together, then the system effectively functions as a single-center-of-projection synthetic camera, which we can configure to provide unprecedented performance along one or more imaging dimensions, such as resolution, signal-to-noise ratio, dynamic range, depth of field, frame rate, or spectral sensitivity. If the cameras are placed farther apart, then the system functions as a multiple-center-of-projection camera, and the data it captures is called a light field. Of particular interest to us are novel methods for estimating 3D scene geometry from the dense imagery captured by the array, and novel ways to construct multi-perspective panoramas from light fields, whether captured by this array or not. Finally, if the cameras are placed at an intermediate spacing, then the system functions as a single camera with a large synthetic aperture, which allows us to see through partially occluding environments like foliage or crowds. If we augment the array of cameras with an array of video projectors, we can implement a discrete approximation of confocal microscopy, in which objects not lying on a selected plane become both blurry and dark, effectively disappearing. These techniques, which we explore in our CVPR and SIGGRAPH papers (listed below), have potential application in scientific imaging, remote sensing, underwater photography, surveillance, and cinematic special effects.
Construction of this array was funded by Intel, Sony, and Interval Research, as part of the Stanford Immersive Television Project. Research using the array was funded by the National Science Foundation and DARPA. Applications of camera and projector arrays were one focus of the Spring 2004 and Winter 2006 versions of our Topics in Computer Graphics course (CS 448). Although the array still works, this project is currently inactive.
(Slides from papers may also be available on the web pages of those papers.)
A list of technical papers, with abstracts and pointers to additional information, is also available. Or you can return to the research projects page or our home page.