Studies On And Implementation Of Selected Topics In Chinese Information Processing

Posted on:2009-12-22

Degree:Master

Type:Thesis

Country:China

Candidate:L J Luo

Full Text:PDF

GTID:2178360242474992

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, all kinds of resources; are continuously increasing. In order to quickly and efficiently find the information, information processing (IP) has become one of the most important research fields.In this thesis, we discuss some key issues in Chinese information processing. The major contents and contributions are as follows:Implementation of a dictionary based on syntactic parsing methods. By mapping grammar rules for the signature word, parsing is converted to the generation of characteristics of syntactic decision tree by the use the term, so that the rule-based method and probability method are effectively combined. In our closed test, the precision and recall are 87.13% and 89.40%, respectively.An improvement of the K-means clustering method has been proposed by the use of sample distance. This effectively avoids errors caused by the choice of the initial point, as well as noise and the impact of isolated points.The thesis also introduces a variety of Corpus deposited in the dictionary structure, as well as the use of the structure of the dictionary; on the characteristics of the word, the use of multi-storey hash storage, with the largest matching forward and achieve a rapid segmentation algorithm, with 1G RAM, the speed of word segmentation is 2 Megabytes per second. Through the combination of Hidden Markov Model POS tagging and the smoothing algorithm, we obtain a tagging precision of 86%, and a disambiguation of 82%. In KNN based classification algorithm, with the use of statistical methods CHI to select feature words, and by loading the document of relevant category behind these words, we have solved the problem of redundant information. Through the use of the characteristics of the sentence and term weighting, we have implemented the statistical mechanical automatic text abstraction. Through the use of vector space model, combined with synonymous term expansion and inverted file storage structure, we have implemented a simple information retrieval system.

Keywords/Search Tags:

Chinese information processing, Corpus, syntax decision tree

PDF Full Text Request

Related items

1	Chinese Grammar Corpus System Design
2	Short Text Similarity Research Based On Abstract Syntax Tree
3	Research On The Rule Excavation Method Based On Decision Tree In Automatic Identification Of Relation Words In Chinese Compound Sentences
4	Research On Corpus Parallel Processing In Chinese Proofreading
5	Research And Implementation Of Characteristics Of Complex Sentence Analyzer In Chinese Information Processing
6	Research On The Classified Method Of Inconsistency Of Segmentation For Chinese Corpus
7	Research On Question Processing Techniques Of Open-Domain Chinese Question Answering System
8	Research And Application Of Chinese Word Segmentation Based On English-Chinese Parallel Corpus
9	Research On Keeping Consistency Of Chinese Corpus Of Complete Parsing
10	Research On Automatic Disambiguation Method Of Tibetan Word Meaning Based On Chinese And Tibetan Parallel Corpus