| Word sense disambiguation is an essential topic in computational linguistics.With the analytical software,researchers combine the context information with other relevant information of ambiguous words in the text to determine the senses of ambiguous words.Feature selection for word sense disambiguation is directly related to the result of natural language processing such as information retrieval,machine translation and text classification.At present,researchers have made great progress in the field of word sense disambiguation.However,its accuracy still needs to be improved,and the main research objects of word sense disambiguation are verbs and nouns.As an important part of speech in English,the polysemy of prepositions directly impacts natural language processing.Therefore,feature selection for word sense disambiguation has significant research value.This thesis studies feature selection of preposition around from the perspective of word sense disambiguation and analyzes the contribution of every optimal feature to word sense disambiguation of preposition around.The sentences in Sem Eval-2007 in TPP(the preposition project)are used as the research data.The formal context including objects and attributes of the objects is constructed under the guidance of the theory of formal concept analysis.According to TPP,the preposition around has six senses.Features include semantic features expressed by pointwise mutual information and syntactic features extracted from corpus based on the probability of co-occurrence with around.A method combining filter method with attribute partial order structure diagram is applied to study word sense disambiguation of around.On this basis,the optimal features of word sense disambiguation of around are explored.This thesis finds out that the accuracy reaches 81.29% using attribute partial order structure diagram method.When applying filter feature selection method,the accuracy reaches 92.26%.The accuracy using the method of combination of filter method and attribute partial order structure diagram is 93.55%,which is 37.45% higher than state of art(56.1%)in the preposition project.On this basis,the feature selection research is carried out.Firstly,according to the application of features to word sense disambiguation in testing set,23 optimal features are selected.They are PMI between 1(1)around and its complement is less than or equal to 0;PMI between 1(1)around and its complement is higher than or equal to 1.16;PMI between 3(2)around and its complement is less than or equal to 0;PMI between 3(2)around and its complement is higher than or equal to 0.91;PMI between 4(3)around and its complement is less than or equal to 0;PMI between 4(3)around and its complement is between 0 and 1.73;PMI between 4(3)-1 around and its complement is less than or equal to 0;PMI between 4(3)-1 around and its complement is higher than or equal to 1.4;PMI between 5(4)around and its complement is less than or equal to 0;PMI between5(4)around and its complement is between 0 and 1.39;PMI between 5(4)around and its complement is higher than or equal to 1.39;the prepositional phrase acts as an adverbial adjunct in a sentence;the prepositional phrase acts as complementation of a verb in a sentence;there are no words between verb and preposition;there are nouns or adjectives between nouns and prepositions;complement of preposition is a pronoun;complement of preposition is a noun or noun phrase;attached verb of preposition is a state verb;attached verb of preposition is an action verb;attached verb of preposition is an action-process verb;complement of the preposition is a human or part of human body;complement of the preposition is concrete object;complement of the preposition is space.Semantic attributes contribute more than syntactic attributes to the word sense disambiguation of preposition around.The research findings provide knowledge basis,experimental support for the further studies on prepositions and also provide method reference for feature selection of word sense disambiguation. |