Multimodal Personality Recognition using Cross-Attention Transformer and Behaviour Encoding. (arXiv:2112.12180v1 [cs.CV])
Personality computing and affective computing have gained recent interest in
many research areas. The datasets for the task generally have multiple
modalities like video, audio, language and bio-signals. In this paper, we
propose a flexible model for the task which exploits all available data. The
task involves complex relations and to avoid using a large model for video
processing specifically, we propose the use of behaviour encoding which boosts
performance with minimal change to the model. Cross-attention using
transformers has become popular in recent times and is utilised for fusion of
different modalities. Since long term relations may exist, breaking the input
into chunks is not desirable, thus the proposed model processes the entire
input together. Our experiments show the importance of each of the above
contributions
Source: https://arxiv.org/abs/2112.12180