The understanding of human behaviors in the scope of computer vision is beneficial to many different areas. Although great achievement has been made, human behavior research investigations are still targeted on isolated, low-level, and individual activities without considering other important factors, such as human-human interactions, human-object interactions, social roles, and surrounding environments. Numerous publications focus on recognizing a small number of individual activities from body motion features with pattern recognition models, and are satisfied with small improvements of recognition rate. Furthermore, methods employed in these investigations are far from being suitable to be used in real cases considering the complexity of human society. In order to address the issue, more attention should be paid on cognition level rather than feature level. In fact, for a deeper understanding of social behavior, there is a need to study its semantic meanings against the social contexts, known as social interaction understanding. A framework for detecting social interaction needs to be established to initiate the study. In addition to individual body motions, more factors, including body motions, social roles, voice, related objects, environment, and other individuals' behaviors were added to the framework.
To meet the needs, this dissertation study proposed a 4-layered hierarchical framework to mathematically model social interactions, and then explored several challenging applications based on the framework to demonstrate the great value of the study. There are no existing multimodality social interaction datasets available for this research. Thus, in Research Topic I, two typical scenes were created with a total of 24 takes (a take means a shot for a scene) as social interaction dataset. Topic II introduced a 4-layered hierarchical framework of social interactions, which contained 1) feature layer, 2) simple behavior layer, 3) behavior sequence layer, and 4) pairwise social interaction layer, from down to top. The top layer eventually generated two persons' joint behaviors in the form of descriptions with semantic meanings. To deal with the recognition within each layer, different statistical models were adopted. In Topic III, three applications based on the social interaction framework were presented, including social engagement, interesting moment, and visualization. The first application measured how strong the interaction was between an interaction pair. The second one detected unusual (interesting) individual behaviors and interactions. The third application aimed to better visually represent data so that users can get access to useful information quickly.
All experiments in Research Topic II and III were based on the social interaction dataset created for the study. Performance of different layers was evaluated by comparing the experiment results with those of existing literature. The framework was demonstrated to be able to successfully capture and model certain social interactions, which can be applied to other situations. The pairwise social interaction layer generated joint behaviors with high accuracy because of the coupling nature of the model. Exploration on social engagement, interesting moments, and visualization shows great practical value of the current research may stimulate discussions and intrigue more research studies in the area.