Ha, Manh-Hung (2024) Top-Heavy CapsNets Based on Spatiotemporal Non-Local for Action Recognition. Journal of Computing Theories and Applications, 2 (1). pp. 39-50. ISSN 3024-9104
10551-Article Text-33517-2-10-20240615.pdf - Published Version
Download (573kB) | Preview
Abstract
To effectively comprehend human actions, we have developed a Deep Neural Network (DNN) that utilizes inner spatiotemporal non-locality to capture meaningful semantic context for efficient action identification. This work introduces the Top-Heavy CapsNet as a novel approach for video analysis, incorporating a 3D Convolutional Neural Network (3DCNN) to apply the thematic actions of local classifiers for effective classification based on motion from the spatiotemporal context in videos. This DNN comprises multiple layers, including 3D Convolutional Neural Network (3DCNN), Spatial Depth-Based Non-Local (SBN) layer, and Deep Capsule (DCapsNet). Firstly, the 3DCNN extracts structured and semantic information from RGB and optical flow streams. Secondly, the SBN layer processes feature blocks with spatial depth to emphasize visually advantageous cues, potentially aiding in action differentiation. Finally, DCapsNet is more effective in exploiting vectorized prominent features to represent objects and various action features for the ultimate label determination. Experimental results demonstrate that the proposed DNN achieves an average accuracy of 97.6%, surpassing conventional DNNs on the traffic police dataset. Furthermore, the proposed DNN attains average accuracies of 98.3% and 80.7% on the UCF101 and HMDB51 datasets, respectively. This underscores the applicability of the proposed DNN for effectively recognizing diverse actions performed by subjects in videos.
Item Type: | Article |
---|---|
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Depositing User: | dl fts |
Date Deposited: | 24 Nov 2024 07:08 |
Last Modified: | 24 Nov 2024 08:07 |
URI: | https://dl.futuretechsci.org/id/eprint/24 |