ICRA 2015 Workshop on Robotic Vision: challenges and opportunities

A half day workshop held on the morning of Tuesday 26 May in Ballroom 6C.

The ICRA 2015 workshops and tutorials page is here.

Post workshop

The workshop was a great success, 320 people registered and it was certainly a very big crowd.  The presented papers were excellent and set the scene well for a great and lively discussion with the panel and audience, thanks to all who contributed. We will try to synthesise some conclusions from the discussions and post them here soon.


Starter talks


Reinforcement learning provides only a weak supervisory signal, posing additional challenges in the form of temporal credit assignment and exploration.  Nevertheless, deep reinforcement learning has already enabled learning to play Atari games from raw pixels (without access to the underlying game state) and carries the promise of end-to-end training of control policies with millions of parameters.
I will discuss major challenges for, as well as some preliminary promising results towards, making deep reinforcement learning applicable to real robotic problems.
Led by the efforts of Google and many leading automobile manufacturers, interest in the field of self-driving vehicles has surged in the past several years.  Predictions abound that fully self-driving cars will arrive in just a few years.  In particular, Elon Musk recently said that he viewed self-driving as "a solved problem" --- in particular saying in his recent talk at an Nvidia event in March 2015 that "Autonomous Cars [are] not something that I think is very difficult; actually I think to do autonomous driving to a degree that is much safer than a person is much easier than people think....its going to just become normal, like an elevator".  In other recent statements (October, 2014) Musk said "maybe five or six years from now I think we'll be able to achieve true autonomous driving where you could literally get in the car, go to sleep and wake up at your destination".  In this talk, I will provide examples, based on capturing driving data in Boston, MA, of massively hard vision and motion planning challenges that are far from solved.  These include making left-turns across high-speed traffic, obeying the gestures of police officers and crossing guards, and operating in difficult weather.   I will offer some thoughts on research strategies that might help address some of these issues.
3. Geometry vs. AppearanceTim Barfoot
In a lot of robotic vision tools, we seem to treat appearance and geometry information somewhat separately (e.g., sequentially). For example, we often use feature descriptors (appearance) to get 3D point correspondences, then use these to get pose change (geometry). There are relatively few examples (there are some) where we treat appearance and geometry simultaneously or jointly. Why is this? What led us to this separation of geometry and appearance information? Can we do better by coming up with a more sophisticated way of treating geometry and appearance together?

4. Natural vision: Pillage or Pass Over?  Michael Milford, QUT

Robotics has a long history of drawing inspiration to varying degrees from nature. Nature is, after all, the ultimate proof of concept of amazing autonomous agents doing incredible things which we would all like our robots to emulate. I will discuss some particularly interesting ways in which robotic vision and natural vision systems are similar and divergent, both from an algorithmic and hardware perspective, in order to kickstart discussion on the question: what do we want our robot vision systems to do?


5. The 100-100 Tracking Challenge, Dieter Fox, U. Washington
Imagine a robot operating alongside people in a task-based environment, such as a kitchen, a research wetlab, or a manufacturing space. Let's assume the robot has 3D models of the environment and all relevant objects, including people. Let's further assume that we can place several depth cameras on the robot and in the environment. Under these assumptions, the 100-100 Tracking Challenge is to track and identify 100% of the motion of all objects and people with 100% accuracy. I will discuss why I believe that it might be useful to solve this challenge, why solving this challenge might be within reach, and why it's not trivial.

Photos from the day

Special thanks to our anonymous photographer (you know who you are!)


Pre workshop



8:30-8:40Welcome and motivation (Greg and Peter)

Invited talks: the case for and against vision

Open call talks: Audacious ideas for robotic vision

10:00-10:30coffee break
10:30-12:00Panel: an agenda for progressing robotic vision, what can we do, what's still to do?


Short URL for this page is http://tiny.cc/robovisionws

Panel discussion

Using as input the invited talks and our experiences we will try to synthesise an agenda for progressing robotic vision, what can we do, what's still to do?  What should we work on, what should we stop working on.
Please come to the panel armed with:
  • opinions
  • good questions
  • a constructive attitude

About the workshop


The objectives of the workshop are therefore to: 

  • Inform the robotics community about the state of the art in computer vision (they are doing some awesome stuff)
  • Share experiences and ideas about what's good, and what's not, with current robotic vision 
  • Run a panel discussion to flesh out a roadmap of the challenges and open questions for computer vision, from a robotics perspective. 


The technologies of robotics and computer vision are each over 50 years old. Once upon a time they were closely related and investigated, separately and together, in AI labs around the world. Vision has always been a hard problem, and early roboticists struggled to make vision work using the slow computers of the day — particularly for metric problems like understanding the geometry of the world. In the 1990s affordable laser rangefinders entered the scene and roboticists adopted them with enthusiasm, delighted with the metric information they could provide. Since that time laser-based perception has come to dominate robotics, while processing images from databases, not from robots, has come to dominate computer vision. What happened to that early partnership between robotics and vision? Is it forever broken, or is now the time to reconsider vision as an effective sensor for robotics?


Corke’s plenary talk at IROS14 was concerned with the split between the robotics and computer vision communities. Roboticists have largely adopted LIDAR and RGBD sensors, whereas the computer vision community is content to process images from non-real time sources such as databases. Even though “computer vision” is a popular keyword in robotics conferences, the converse is not true, there are very few robotics papers at computer vision conferences. A show of hands during the plenary indicated less than 20 out of 1600 IROS delegates attend mainstream computer vision conferences such as ICCV or CVPR. There is much to be gained from the use of computer vision as the primary sensing modality for robots. This discipline split, and how to rectify it, seemed to have touched a nerve, and I was approached by many people who shared the belief that it was an issue that should be rectified. This workshop will work to bridge this gap by presenting state-of-the-art techniques from the computer vision community to the robotics community, and a panel discussion to flesh out a roadmap of the challenges and open questions for computer vision, from a robotics perspective. 


Similar conversations are going with the relationship between AI research and robotics, how to close the gap between the two communities.  A recent AAAI workshop discussed this and its website also includes slides of talks, they also developed a position paper.



The workshop is organized by Peter Corke (Queensland University of Technology) and Tom Drummond (Monash University), both with the Australian Centre for Robotic Vision, and Greg Hager (Johns Hopkins University).