Research Of Chinese Word Segmentation In BERSE

Posted on:2007-10-09

Degree:Master

Type:Thesis

Country:China

Candidate:L D Huang

Full Text:PDF

GTID:2178360185977004

Subject:Education Technology

Abstract/Summary:

PDF Full Text Request

Automatic word segmentation is the basis of NLP. Every Chinese information processing System based of Chinese word must depend on word segmentation. The focus and difficult of Chinese word segmentation are the processing method to crossing- ambiguity word and unknown-word .On the base of BERSE (Basic Education Resource Research Engine) projection, I discussed the method of Chinese word segmentation in this paper.To resolve crossing- ambiguity word problem, 1 found that the distinguishing feature for high frequency characteristic word and presented a method to deal with crossing- ambiguity word, As a support , we used character bi-gram for ambiguity resolution in Chinese word segmentation in the system implement.To identify Chinese name, I adopted a resolution of statistic and regulation, used statistical measures, analyzed the character of name and presented a method for name sort. In system design , it can find Chinese name by the word segmentation fragment, experiments show that about 90% correct rate is achieved.In this paper, I tried to find high frequency word as a resolution to unknown-words. Word segmentation system can count time of successful match and paper for high frequency, and then use some weight to append these unknown-words into vocabulary. Therefore, the capacity of main vocabulary can be extended automatically.In the end of this paper, the frame, flow and interface design of Chinese word segmentation system have been expatiated.

Keywords/Search Tags:

Chinese Information Process, Chinese word segmentation, crossing ambiguity, un-known word

PDF Full Text Request

Related items

1	Research Of Combined Chinese Word Segmentation Method
2	Chinese Word Auto-segmentation Design And Algorithm Realization For Chinese Network Information Retrieval
3	Study On Disambiguation Algorithm For Chinese Word Segmentation
4	Word Segmentation And Pos Tagging In Chinese
5	The Research Of Chinese Word Segmentation Disambiguation Based On Word Environment Information
6	Research On Overlapping Ambiguity Treatment For Chinese Word Segmentation
7	Based On The Understanding Of The Chinese Word System Design And Realization
8	Chinese Word Segmentation Algorithm Based On Ontology Research And Implementation
9	Chinese Word Segmentation Technology Research Based On Lucene
10	Reverse Backtracking Research Of Chinese Segmentation Based On Last Word Dictionary