Abstract
We have developed a knowledge base of words as a tool to measure the semantic similarity between words. In this paper, we evaluate the knowledge base of words comparing with thesauruses, which are commonly used for measuring similarity. Thesauruses of NIHONGO-GOI-TAIKEI and Japan Electronic Dictionary are selected for the evaluation. For similarity calculation methods using thesauruses, we adopt a newly proposed method, in which each word is represented with vector using the structural feature of thesauruses and the degree of similarity between words is calculated by the inner product of their vectors, in addition to traditional methods based on the path length between categories or the depth of the subsumer. Evaluation is carried out through the two methods, that is, a traditional method based on human rating and the method we have already proposed, feasible for evaluating automatically without human judgment. Evaluation result shows that the knowledge base of word is superior to the both thesauruses as measurement tools, and the proposed calculation method outperforms the traditional ones. The result also shows that our evaluation method is a practical one, by investigating the correlation of both methods.