PTN-14 A Bag-of-words Equivalent Recurrent Neural Network For Action Recognition

Enter pincode for exact delivery dates and charge
Safe and Secure payments.100% Authentic products
BrandMelody's Ideal ForUnisex Age Group8+ Years

Abstract: The traditional bag-of-words approach has found a wide range of applications in
computer vision. The standard pipeline consists of a generation of a visual vocabulary, a
quantization of the features into histograms of visual words, and a classification step for which
usually a support vector machine in combination with a non-linear kernel is used. Given large
amounts of data, however, the model suffers from a lack of discriminative power. This applies
particularly for action recognition, where the vast amount of video features needs to be
subsampled for unsupervised visual vocabulary generation. Moreover, the kernel computation
can be very expensive on large datasets. In this work, we propose a recurrent neural network that
is equivalent to the traditional bag-of-words approach but enables for the application of
discriminative training. The model further allows to incorporate the kernel computation into the
neural network directly, solving the complexity issue and allowing to represent the complete
classification system within a single network. We evaluate our method on four recent action
recognition benchmarks and show that the conventional model as well as sparse coding methods
are outperformed.