| Anaphora is common in our daily life, and it is one of the important ways to connect the whole discourse. Anaphora can make the discourse clear and concise. It is helpful for the computer to analyze and understand the discourse, and it is applied widely to auto text summarization, question answering system and machine translation. With the development of the process of discourse, anaphora resolution displays its importance, and become the hot point on which researches focuses.In this paper, we describe the importance of anaphora resolution to the Natural Language Processing, and summarize the theoretical researches and realization methods about anaphora resolution. On the basis of Chinese Treebank Corpus, we realize an approach based on features extraction and feature weighing. We also realize a method based on machine learning for anaphora resolution.The feature extraction is the points of the paper. The feature extraction for anaphoras and antecedent candidates is the prerequisite for the construction of the antecedent candidate set. The extraction of Person, Gender and Number feature is done to construct the antecedent candidate set. The feature extraction for the candidate pair is the only term to decide which candidate is the antecedent in the candidate set. 8 features are extracted for each candidate pair, and weight values are generated by weighing these 8 features. The antecedent is the word which consists of the candidate pair with highest weight value.As special linguistics phenomena, the words which consist of the collocation can intimate each other. In this paper, collocation is employed to extract the semantic feature, as well as to improve statistical features extraction. Collocation is important to the feature extraction.As a machine learning algorithm, Support Vector Machine (SVM) can collect as much information for classification as possible. In this paper, anaphora resolution is regarded as a special question of classification, that is, for each anaphora, only one of the candidate pairs which anaphora consists with antecedent candidates in the candidate set, can be classified as positive. SVMlight is employed to realize the anaphora resolution approach based on machine learning. The features which extracted for the weighing approach are used to be the basis of classification. The intermediate result is used to determine which word is the antecedent.With the approach of the feature extraction which is improved by the collocation, we get an accuracy of 86.37% in the whole Chinese Treebank corpus. And we also get an accuracy which exceeds 90% in the special corpus with the approach based on SVM. The application of the collocation also contributes an improvement of more than 10% for anaphora resolution. The experiment shows our approaches'value for the anaphora resolution. |