Hugs and kisses exchanged under the mistletoe are among the human interactions which can now be automatically recognised by computers from video footage, thanks to new research.
The technology, developed at Oxford University, can also automatically recognise interactions such as handshakes and high fives. It is part of research to enable computers to automatically analyse the content of the vast amount of video footage generated from sources such as TV, films, YouTube and CCTV.
‘Human actions and activities are of central importance in video analysis,’ said Alonso Patron-Perez of Oxford University’s Department of Engineering Science, who led the research. ‘This new work makes it possible to recognise two-person human interactions, such as hugs, kisses and hand-shakes, automatically. Once you can recognise these interactions the applications are numerous: for instance you could automatically search home videos and YouTube for kisses and handshakes or even fast forward CCTV to find incidents.’
The method, developed by an Oxford University team including Alonso Patron-Perez, Dr Ian Reid, Dr Marcin Marszalek, and Professor Andrew Zisserman, is built on algorithms from computer vision and machine learning.
Teaching computers to recognise the interactions involves a number of steps: first, humans are detected and tracked through the video footage; then, once the position of the humans in the video is established, different cues such as head orientation and relative motion of people’s bodies are computed to determine if an interaction occurs and, if it does, what kind of interaction it is.
All this information is computed for several examples of each interaction (the team has focused on four interactions so far: handshakes, high fives, hugs and kisses), and machine learning methods are then used to learn a model for each interaction from these examples.
Alonso Patron-Perez said: ‘Once a computer has learnt these models, human interactions can then be located and recognised in new videos, with the computer able to determine when in the video interactions occur, which people are interacting and what kind of interactions are involved. This work enables computers to make sense of how people are behaving in video footage in a way that has simply not been possible before.’