SpringerOpen Newsletter

Receive periodic news and updates relating to SpringerOpen.

This article is part of the series Patches in Vision.

Open Access Open Badges Research Article

Unsupervised Modeling of Objects and Their Hierarchical Contextual Interactions

Devi Parikh* and Tsuhan Chen

Author Affiliations

Department of Electrical and Computer Engineering, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA

For all author emails, please log on.

EURASIP Journal on Image and Video Processing 2009, 2009:184618  doi:10.1155/2009/184618

The electronic version of this article is the complete one and can be found online at: http://jivp.eurasipjournals.com/content/2009/1/184618

Received:11 June 2008
Accepted:2 September 2008
Published:26 January 2009

© 2009 The Author(s).

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


A successful representation of objects in literature is as a collection of patches, or parts, with a certain appearance and position. The relative locations of the different parts of an object are constrained by the geometry of the object. Going beyond a single object, consider a collection of images of a particular scene category containing multiple (recurring) objects. The parts belonging to different objects are not constrained by such a geometry. However, the objects themselves, arguably due to their semantic relationships, demonstrate a pattern in their relative locations. Hence, analyzing the interactions among the parts across the collection of images can allow for extraction of the foreground objects, and analyzing the interactions among these objects can allow for a semantically meaningful grouping of these objects, which characterizes the entire scene. These groupings are typically hierarchical. We introduce hierarchical semantics of objects (hSO) that captures this hierarchical grouping. We propose an approach for the unsupervised learning of the hSO from a collection of images of a particular scene. We also demonstrate the use of the hSO in providing context for enhanced object localization in the presence of significant occlusions, and show its superior performance over a fully connected graphical model for the same task.

Publisher note

To access the full article, please see PDF.