Skip to content

You are here: Home / News / Learning efficient representations for image and video understandinga

Learning efficient representations for image and video understandinga

Image Processing Group has organised a talk given by Yannis Kalantidis (formerly in Facebook AI Menlo Park).

Two important challenges in image and video understanding are designing more effective and efficient deep Convolutional Neural Networks and learning models that can achieve higher-level understanding. In this talk, I will present some of my recent works towards tackling these challenges. Specifically, I will present the Global Reasoning Networks [CVPR 2019], a new approach for reasoning over arbitrary sets of features of the input, by projecting them from a coordinate space into an interaction space where relational reasoning can be efficiently computed. I will also introduce the Octave Convolution [ICCV 2019], a plug-and-play replacement for the convolution operator that exploits the spatial redundancy of CNN activations and can be used without any adjustments to the network architecture. The two methods presented are complementary and achieve state-of-the-art performance on both image and video tasks. Aiming for higher-level understanding, I will also present our recent works on vision and language modelling, specifically our work on learning state-of-the-art image and video captioning models that are also able to better visually ground the generated sentences [CVPR 2019, arXiv 2019]. The talk will conclude with current research and a brief vision for the future.

For the last three years, Yannis Kalantidis was a research scientist at Facebook AI in Menlo Park, California. He grew up in Athens, Greece and lived there till 2015, with brief breaks in Sweden, Spain and the United States. He got his PhD on large-scale search and clustering from the National Technical University of Athens in 2014. He was a postdoc and research scientist at Yahoo Research in San Francisco for from 2015 until 2017, leading the visual similarity search project at Flickr and participated in the Visual Genome dataset efforts with Stanford. He is currently conducting research on representation learning, video understanding and modelling of vision and language.

 If you are planning to attend, please provide your e-mail in this form.

When: Thursday 21st November, 12 pm
Where: Campus Nord UPC, Building A3, Aula Màster (basement)