With the recent proliferation of 3D sensors such as Light Detection and Ranging (LIDAR), it is essential to develop feature representation methods that can best characterize the point clouds produced by these devices. When these devices are employed in targeting and surveillance of human actions from both ground and aerial platforms, the corresponding point clouds of body shape often comprise low-resolution, disjoint, and irregular patches of points resulted from self-occlusions and viewing angle variations. The prevailing method of depth image analysis has the limitation of relying on 2D features that are not native representation of 3D spatial relationships. On the other hand, many existing 3D shape descriptors cannot work effectively with these degenerated point clouds because of their dependency on 360-degree dense and smooth point clouds.
In this research, a new degeneracy-tolerable, multi-scale 3D shape descriptor based on the discrete orthogonal Tchebichef moment, named Tchebichef moment shape descriptor (TMSD), is proposed as an alternative for single-view partial point cloud representation and characterization. It has the advantage of decomposing a complex 3D surface or volumetric distribution into orthogonal moments in a much compact subspace that is independent of learning datasets, thereby supports accurate, robust, and consistent shape search and pattern recognition in the embedded subspace. Complimentary to the proposed descriptor, a new voxelization and normalization scheme is proposed to achieve translation, scale, and resolution invariance, which may be less of a concern in the traditional full-body 3D shape analysis but are crucial requirements for discerning partial point clouds.
To evaluate the effectiveness of TMSD and voxelization algorithms for static pose shape search and dynamic action recognition, we built a first-of-its-kind multi-subject pose shape baseline consisting of simulated LIDAR captures of actions at different viewing angles. Compared to the other existing public datasets, our baseline has more subjects and viewing angle variations to support solid algorithm development and evaluation. Using the pose shape baseline, we developed single-view nearest neighbor (NN) search for pose shape retrieval using TMSD. We proved the lower bounding distance condition under the orthonormality of Tchebichef moment, which prevents false dismissal by any subspace queries. Our experimental results show that 3D TMSD performs significantly better than 3D Fourier transform (3D DFT) and slightly better than 3D wavelet transform (3D DWT). It is also more flexible than 3D DWT for multi-scale representation because it does not have the restriction of dyadic sampling. The action recognition was built on the Naive Bayes classifiers using temporal statistics of a 'bag of pose shapes'. Our experiments demonstrate that the 3D TMSD-based classification of action and viewing angle outperforms the similar classification based on the depth image analysis using the popular 2D features of the histograms of oriented gradients.
In other experiments, we demonstrated our approach's scale invariance by showing consistent query and classification performance across a wide range of spatial scales, down to the extremely small scale of 6% of the original point clouds, at which level the 2D depth image analysis tends to degrade significantly. We also validated performance against varying viewing angles on both azimuth and elevation directions, which has an important implication for aerial sensor platforms.
In summary, many of the performance advantages shown by TMSD are fundamentally due to its sound mathematical properties. Through the direct 3D encoding of point cloud distribution, our research offers a promising new approach for analyzing low-quality, single-view 3D sensor data, other than the usual approach of 2D-based depth image analysis.