CS547 Human-Computer Interaction Seminar  (Seminar on People, Computers, and Design)

Fridays 12:30-1:50 · Gates B01 · Open to the public
Previous | Next
Archive
Hamid Aghajan
Stanford Dept. of Electrical Engineering
Human-centered Vision Systems: Ideas for enabling Ambient Intelligence and serving Social Networks
November 13, 2009

You need Flash player 8+ and JavaScript enabled to view this video.

Vision offers rich information about events involving human activities in user‐centric applications from gesture recognition to occupancy reasoning. Multi‐camera vision allows for applications based on 3D perception and reconstruction, offers opportunities for collaborative decision making, and enables hybrid processing through task assignment to different cameras based on their views.

In addition to the inherent complexities in vision processing stemming from perspective view and occlusions, setup and calibration requirements have challenged the creation of meaningful applications that can operate in uncontrolled environments. Moreover, the task of studying user acceptance criteria such as privacy management and the implications in visual ambient communication has for the most part stayed out of the realm of technology design, further hindering the roll‐out of vision‐based applications in spite of the available sensing, processing, and networking technologies.

The output of visual processing often consists of instantaneous measurements such as location and pose, enabling the vision module to yield quantitative knowledge to higher levels of reasoning. The extracted information is not always flawless and often needs further interpretation at a data fusion level. Also while quantitative knowledge is essential in many smart environments applications such as gesture control and accident detection, most ambient intelligence applications need to also depend on qualitative knowledge accumulated over time in order to learn user’s behavior models and adapt their services to the preferences explicitly or implicitly stated by the user.

Proper interfacing of vision to high‐level reasoning allows for integration of information arriving at different times and from different cameras, and application‐level interpretation according to the associated confidence levels, available contextual data, as well as the accumulated knowledge base from the user history and behavior model.

This talk presents ideas for interfacing vision to other modules to enable real applications in presence of imperfect vision processing output. A number of data fusion models and potential applications are discussed involving the recognition of user activities, namely environment discovery based on user interaction, context‐based ambience control services, speaker assistance system, and experience sharing using avatars..



Hamid Aghajan is a professor of Electrical Engineering (consulting) at Stanford University since 2003. Areas of research in his group consist of multi-camera networks and human interfaces for ambient intelligence and smart environments, with application to smart homes, occupancy-based services, assisted living and well being, ambience control, smart meetings and speaker assistance systems, and avatar-based communication and social interactions. Hamid is editor-in-chief of the Journal of Ambient Intelligence and Smart Environments. He has authored 3 edited volumes on: Multi-Camera Networks – Principles and Applications, Human-centric Interfaces for Ambient Intelligence, and Handbook of Ambient Intelligence and Smart Environments. He has chaired workshops at ICCV, ECCV, ECAI, ACM Multimedia, ICMI-MLMI, and has offered short courses and tutorials at CVPR, ICCV, ICASSP, and ICDSC. Hamid obtained his Ph.D. degree in electrical engineering from Stanford University in 1995.