Font Size: a A A

Research Of Chinese Text Preprocessing Based On Semantic

Posted on:2012-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:N ZhangFull Text:PDF
GTID:2178330332488289Subject:Information Science
Abstract/Summary:PDF Full Text Request
Chinese text classification is aimed at processing complicated and flexibleChinese language,which contains complex semantic relations such as similarity andpolysemy.Compared with English text,specific characteristics aJ'e contained in texts'structure and feature extraction mechanism However,research on semantic feature ofChinese text is lack of a perfect mechanism at present Therefore study on this subject isinevitableWord segmentation,Pos tagging and Feature extraction are mainly researched inthis paper,based on the characteristic that semantic information is abundant in Chinesetext First,Chinese text'S features and difficulties in the processing are analyzed as wellas current research situation about text classification and preprocessing Then analgorithm of eliminating ambiguity in word segmentation using semantic knowledgerepository is proposed,as well as a method of pos tagging using labeled semanticinformation combined with dictionary.In the course of feature extraction,conceptclustering is made according to semantic concepts provided by semantic knowledgerepository,SO that feature'S spatial dimension is reduced At last,a word segmenting andpos tagging system based on semantics is designed according to the algorithm providedin this paper,experiment analysis is made according to the result,and the availability ofsemantic feature extraction method is also be verified Experiment results show that alarge amount of segmenting ambiguity can be solved in this way,accuracy of wordsegmentation and pos tagging is increased and feature'S spatial dimension is reduced,SOthat the overall performance of classification can be improvedText preprocessing is the fundamental part of text classification,which is largelyaffected by its performance,Thus,research on the procedure of Chinese textpreprocessing based on semantic is of great importance in better representing text,improving the result oftext preprocessing and classification'S accuracy and efficiency....
Keywords/Search Tags:Text classification, Chinese text preprocessing, Semantic feature, Semantic knowledge repository, Semantic word Segmentation
PDF Full Text Request
Related items