SAT Based Analogy Evaluation Framework for Persian Word Embeddings. (arXiv:2106.15674v1 [cs.CL])

In recent years there has been a special interest in word embeddings as a new
approach to convert words to vectors. It has been a focal point to understand
how much of the semantics of the the words has been transferred into embedding
vectors. This is important as the embedding is going to be used as the basis
for downstream NLP applications and it will be costly to evaluate the
application end-to-end in order to identify quality of the used embedding
model. Generally the word embeddings are evaluated through a number of tests,
including analogy test. In this paper we propose a test framework for Persian
embedding models. Persian is a low resource language and there is no rich
semantic benchmark to evaluate word embedding models for this language. In this
paper we introduce an evaluation framework including a hand crafted Persian SAT
based analogy dataset, a colliquial test set (specific to Persian) and a
benchmark to study the impact of various parameters on the semantic evaluation



