These are some of the detections of our algorithm on consumer videos.
Our approach is able to generate non-human actors proposals on the A2D dataset. On the left and center, qualitative visualizations of action proposals are shownfor two non-human actors,Bird and Ball. Recalls at IoU=0.5 for all 8 actor classes are shown on the right side. The recalls are consistently high for all the classes except for Ball, which is understandable due to its common shape and small size, which invite many occlusions
We also evaluate our approach for the task of action localization in the challenging THUMOS13 dataset. The top row shows three successful cases by visualizing the ground-truth and action tubes as well as two highlighted frames. These include action sequences that have deformations of actor as well as multiple actors with complex background. The bottom row visualizes three failed cases which show that crowded background, occlusions and temporally untrimmed action sequences are the most challenging scenarios.