Font Size: a A A

Research Of Mining On Recognition Feature Of Two-Sentential Generalized Coordinate Compound Sentences

Posted on:2017-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q J ZhuFull Text:PDF
GTID:2428330488485680Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
At present,the research of Chinese information processing has been from the stage of word processing and words processing to the stage of sentence processing and text processing.The research of Chinese compound sentences belongs to sentence processing,and is also the base of text processing.Compound sentences processing includes recognition of relation categories in compound sentences,syntactic analysis in compound sentences,syntactic of hierarchical structure in compound sentences and so on.The common method of relation categories' recognition in compound sentences is that judging the relation categories of compound sentences by the relation categories of relation words which is recognized from the compound sentences at first.This paper researching the recognition of relation categories in compound sentences through the differences on semantic relation of words in different categories of sentences' two clauses.The research objects in this paper is two-sentential compound sentences which only have two clauses.This paper firstly analysis the sentences from corpus with dependency relationship,and select sentences satisfied this paper's demand from corpus through analysis results as experimental corpus.Classifying the experimental corpus into different categories according to the cross-references of relation words and relation categories by the extracted quasi-relation-words.Then this paper extracts two clauses' center words and nouns that depended on the center words in Dependency Grammar.Computing the similarity and relevance between center words and between nouns,counting and analyzing the two groups' compute results according to different categories.The statistical result shows that the coordinate compound sentences have some differences with other compound sentences both on the distribution of similarity between center words and distribution of similarity between nouns.Besides,finding the coordinate compound sentences also have difference between two clauses' subjects with other compound sentences by analyzing the corpus.At last,this paper uses these differences as feature,represents features as feature vectors,and structures training and testing sets according to the categories between coordinate compound sentences and other compound sentences,then recognizes coordinate compound sentences by training and testing with naive bayesian model.Carrying out experiment of recognize coordinate compound sentences on selected two-sentential compound sentences,and the experiment gets a higher correct rate.In addition,carrying out the same experiment just using differences on clauses' subjects as feature,and contrasting the experimental results with the results of previous experiment.The comparison result shows,it can achieve a better recognition effect after combining the distribution of similarity between center words and distribution of similarity between nouns as features than just using difference of clauses' subjects as feature when recognizing coordinate compound sentences.The two experiments show that the research in this paper is feasible,and effect features in recognizing coordinate compound sentences can be excavated through the research in this paper.
Keywords/Search Tags:two-sentential compound sentence, relation category, center words, bayesian model
PDF Full Text Request
Related items