Some of the items most recently deposited to the Stanford Digital Repository (SDR) include thousands of images that are nothing short of, well, ordinary. For instance, in the Names 100 Dataset, you can download a folder containing 80,000 small images depicting the faces of ordinary people. In another case, there are millions of snapshots of San Francisco street scenes and buildings. Each image is notable for its lack of distinction. It’s as if anyone could have captured these images using their smartphone. And that is precisely the point.
This is the work of the Image, Video, and Multimedia Systems group, a research team in the Department of Electrical Engineering, led by Professor Bernd Girod and an impressive group of graduate students with a respectable publishing track record.
These researchers are exploring technologies to realize the promise of image and video search, with a particular focus on mobile devices. They create these datasets to develop and demonstrate their methods for automatic recognition of landmarks or a person’s gender or age, and then share the datasets with peers in their field in order that they may do the same.
I asked David Chen, a PhD candidate working with Girod and an active SDR depositor, some questions about this work and where the SDR fits in. Here are his replies.
Q: Can you please describe your research and how these datasets support it and your teaching?
A: "Our research has been focused in the area of mobile visual search. Handheld mobile devices, such as camera phones or tablets, are becoming ubiquitous platforms for visual search and augmented reality applications. For mobile image matching, a visual database is typically stored at a server in the network. Hence, for a visual comparison, information must be either uploaded from the mobile to the server, or downloaded from the server to the mobile. With relatively slow wireless links, the response time of the system critically depends on how much information must be transferred in both directions. Our research focuses on mobile image matching with a 'bag-of-visual-words' approach based on robust feature descriptors. We have demonstrated that dramatic speed-ups and power savings are possible by considering recognition, compression, and retrieval jointly. Our real-time implementations for different example applications, such as recognition of landmarks, media covers or printed documents, show the benefits of implementing computer vision algorithms on the phone, the server, or both.
"To advance the development of algorithms for mobile visual search, we have developed the following datasets:
- Stanford Mobile Visual Search Dataset
- Stanford Streaming Mobile Augmented Reality Dataset
- San Francisco Landmark Dataset.
All three datasets have been published in the computer vision and signal processing literature and are actively being used by researchers and practitioners of mobile visual search to compare their new methods."
Q: Can you comment on how the SDR has facilitated your work, or any other benefits that this archiving service has provided you?
A: "Hosting the datasets through the SDR is a very convenient option. We do not have to worry about finding space on our own servers to store the data, have to worry about the data being deleted over time by new generations of researchers, and have to worry about supporting large volumes of web traffic from users trying to download the data. The permanent URL we can assign to a dataset is incredibly useful. We can put this permanent URL in publications and reports and have the confidence that the URL will remain valid for many years to come.”
We wish David and team the best of luck in their exciting work!