English modal verb expresses speakers’ thoughts and attitudes toward the state or action,which can be very decisive for semantic recognition.Therefore,it is widely used in communication.However,the senses of English modal verb are complicated and ambiguous,which not only brings misuse and misunderstanding in human communication,but also causes trouble in natural language processing.Hence,the study on semantic classification of the English modal verbs is significant and the study of feature selection for the sense classification is of great importance in linguistic studies.This thesis studies feature selection for sense classification of primary English modal verbs can and may based on Filter-APOSD approach and analyzes the contribution of semantic features and syntactic features to word sense disambiguation of primary English modal verbs can and may.In this study,the meaning of the target word may is divided into three senses: root possibility,epistemic possibility and permission;the meaning of the target word can is also divided into three senses: ability,root possibility and permission,then 300 sample sentences for can and may are selected from a five-million-word self-built corpus respectively.Next,the extracted features are taken as attributes to construct formal context in accordance with the theory of formal concept analysis.Based on the two formal contexts,the APOSD diagrams are generated and the final optimal feature sets for sense classification are extracted by Filter-APOSD approach.The final accuracy for sense classification of English modal verb can is 95.00%,and for English modal verb may is 98.33%.The accuracy proves the effectiveness of the method.Based on the extracted rules,33 optimal features are selected for sense classification of English modal verb can,and 34 optimal features are selected for sense classification of English modal verb may.By analyzing the selected features,the knowledge about the two English modal verbs can and may are discovered.The major findings are as follows: firstly,the most important features for sense classification of can are speaker’s authority over the subject,harmonic combination(co-occurrence with I think,I suppose etc.),law/regulation related topics and so on.These features have direct restrictions to the meaning of can.Secondly,the most important features for sense classification of may are progressive aspect,collocation of may or may not,harmonic combination(co-occurrence with I think,I suppose etc.)and so on.These features have direct restrictions to the meaning of may.Thirdly,semantic features have greater function of generalization while syntactic features have better function of classification.Fourthly,syntactic features have greater contribution than semantic features in sense classification of English modal verb can and may.The research findings provide knowledge basis,experimental support for the further studies on English modal verb and natural language processing,and reference for feature selection and machine translation. |