One of the major success stories of Computer Vision over the 1990s and 2000s was the development of systems that can simultaneously build a map of the environment and localize a camera with respect to that environment. Within a batch framework this is usually known as structure from motion or multi-view reconstruction, and systems that can reconstruct city-scale (or even wider-scale) environments with accurate geometry are now common, whether from video or from a less well ordered set of images. Within the robotics and real-time vision communities this problem is often referred to as visual SLAM, Simultaneous Localization and Mapping. The current state-of-the-art visual SLAM systems can reconstruct large areas either densely or semi-densely (i.e. with a depth estimate at all or many pixels) with high accuracy using just a single camera, in real-time.
As impressive as these systems and algorithms are, they have no understanding of the scenes they observe – at best they provide a dense geometric point-cloud – and fall well short of the ability of the human visual system to assimilate high-level visual information. In contrast to much of the work in multi-view analysis, those working with single images have made significant progress in applications such as segmentation and object and place recognition, so that an image can be labelled with high level designations of its content, explaining not just the geometry, but also the semantic content of the image. Nevertheless these algorithms are often slow, require offline learning that does not necessarily adapt or transfer between environments, and importantly have no temporal component.
Within the last few years a number of researchers in computer vision and robotics have recognised the benefits that applying this semantic level analysis to image sequences acquired by a moving camera, proposing a shift from purely geometric reconstruction and mapping, to semantic level descriptions of scenes involving objects, surfaces, attributes and scene relations that together capture an understanding of the scene. This shift will enable recognition of long-term change through maps that are more versatile, informative, and compact and so it is likely that through advances in this area the quest for long-term autonomy can make greatest gains.
This workshop will bring together interested researchers in multi-view geometry, scene understanding and robotic vision with the common goal of discussing and presenting the state-of-the-art in the use of semantics in structure from motion and robotic vision, and in considering the most fruitful and challenging areas for the development of the field. The workshop will cmpirse invited talks, and a limited number of submitted talks, selected for presentation by the organising committee. Topics to be addressed will include:
The following are confirmed invited speakers:
The program is tentative, but indicative
|9.00||Welcome and Introduction|
|9.15||Invited talk: Martial Hebert, Semantically-referenced navigation|
|9.50||Invited talk: Raquel Urtasun, Exploiting the Web for Reconstruction, Recognition and Self-localization|
|10:25||Arsalan Mousavian and Jana Kosecka, Semantically Aware Bag-of-Words for Localization|
|11.00||Ankur Handa, Viorica Patraucean, Vijay Badrinarayanan, Simon Stent and Roberto Cipolla, SynthCam: Semantic Understanding With Synthetic Indoor Scenes|
|11:15||Yinda Zhang, Shuran Song, Ping Tan and Jianxiong Xiao, PanoContext: A Whole-room 3D Context Model for Panoramic Scene Understanding|
|11.30||Invited talk: John Leonard, Challenges for Life-Long Visual Mapping and Navigation|
|12:05||Invited talk: Ashutosh Saxena, RoboBrain: Large-Scale Knowledge Engine for Robots|
|13:40||Cancelled: Richard Newcombe, What do we hope to achieve with visual SLAM?|
|14:05||Fisher Yu, Jianxiong Xiao and Thomas Funkhouser, Semantic Alignment of LiDAR Data at City Scale|
|14:20||Hayko Riemenschneider, Andras Bodis-Szomoru, Andelo Martinovic, Julien Weissenberg, Luc Van Gool, What is needed for Multi-view Semantic Segmentation?|
|15:10||Jinglu Wang, Showei Li, Jongbo Liu, Honghui Zhang, Tian Fang, Siyu Zhu, Punze Zhang, Shengnan Cai and Long Quan, Semantic Segmentation of Large-scale Urban 3D data with Low Annotation Cost|
|15:25||Invited talk: Marc Pollefeys, Semantic 3D Reconstruction|
|16:00||Discussion / wrap-up|