Transformer-F: A Transformer network with effective methods for learning universal sentence representation. (arXiv:2107.00653v1 [cs.CL])

The Transformer model is widely used in natural language processing for
sentence representation. However, the previous Transformer-based models focus
on function words that have limited meaning in most cases and could merely
extract high-level semantic abstraction features. In this paper, two approaches
are introduced to improve the performance of Transformers. We calculated the
attention score by multiplying the part-of-speech weight vector with the
correlation coefficient, which helps extract the words with more practical
meaning. The weight vector is obtained by the input text sequence based on the
importance of the part-of-speech. Furthermore, we fuse the features of each
layer to make the sentence representation results more comprehensive and
accurate. In experiments, we demonstrate the effectiveness of our model
Transformer-F on three standard text classification datasets. Experimental
results show that our proposed model significantly boosts the performance of
text classification as compared to the baseline model. Specifically, we obtain
a 5.28% relative improvement over the vanilla Transformer on the simple tasks.



Related post