Font Size: a A A

The Research Of Subjective Sentence Extraction Method For Texts On Network

Posted on:2013-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:W W ZhangFull Text:PDF
GTID:2298330422974197Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays for the network information produces and spreads very fast, it is fasterand more convenient for people to get information. But at the same time redundantinformation volume increases, people become more difficult to obtain the valuableinformation. So the reconstruction and retrieval of information has become a hot topicof research. Subjective sentence extraction technology is a means of informationextraction. It is designed to extract the sentences which contain the author’s subjectiveview. Nowadays its application is more and more broad.As the increase of the number of Internet users in China, especially the number ofthe young users, the network language has become more and more non-standard, andeven barbarous. We searched the subjective sentence extraction method fornon-standard texts. We mainly studied the following problem: how to standardize thenon-standard texts, whether the domain nouns are related with subjective sentences,how to construct an unsupervised subjective sentence extraction method, how differentfeature selection algorithms influence the subjective sentence extraction method fornon-standard texts when training the classifier. The following items are the researcheswe have done:(1)We summarized the non-standard items of the network texts, and proposed anapproach which can improve the corpus and its segmentation results to reduce the badeffects brought by the non-standard items.(2)We put forward an unsupervised subjective sentence extraction method. Itsmain idea is to get training examples automatically by using an accurate-extractionmethod and a broad-extraction method. Accurate-extraction method is a rule-basedmethod which has a high accuracy. Broad-extraction method is a lexicon-based methodwhich uses a very broad sentiment lexicon. This method has a high recall. We used thesentences extracted by the accurate-extraction method and the broad-extraction methodto construct the training samples. Then we used these samples to train a SVM classifierand used the trained classifier to extract subjective sentences in the corpus.(3)We used domain nouns in the rules of the accurate-extraction method. Thedomain nouns are extracted automatically in the corpus. We made a experiment to studywhether the domain nouns are related with subjective sentences.(4)In the process of training the SVM classifier, we compared the performance ofthe mutual information and the chi-square statistic feature selection algorithm.The experiment results show that the subjective sentence extraction methodproposed in this paper has better performance in the unsupervised methods, with acertain ability to cope with the non-standard texts. The approach we proposed toimprove the corpus and its segmentation results is effective. The domain nouns are related with subjective sentences. Using domain nouns in the rules of theaccurate-extraction method can improve the capability of the subjective sentenceextraction method we proposed. Compared with the chi-square statistic, mutualinformation has better performance when selecting features to train the SVM classifier.A better feature selection algorithm can reduce the bad effects of the non-standard itemswhen extracting subjective sentences.
Keywords/Search Tags:Subjective Sentence, Non-standard, Unsupervised, Domain Noun, Feature Selection
PDF Full Text Request
Related items