Based On The Statistics Of Open Chinese Word Segmentation

Posted on:2003-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:H C Guan

Full Text:PDF

GTID:2208360065455536

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

The Chinese automatic word segmentation is an important part in the Chinese information processing. The method based on statistics has the problem of training data's rarefaction, and what restricts the more progress of corpus is the too large workload of manual tagging.Refer to Chinese automatic word segmentation based on statistics, this paper imports the mechanism of open learning, and uses the method of supervised and unsupervised learning. The word segmentation model includes credibility revising and partial tri-gram information. Then it discusses several problems such as segmentation algorithm and human-computer interface during system implementing. The arguments and thresholds of the model are determined through the experiments. The test result shows that, with the open learning model, the close segmentation accuracy can reach 99.07% while the open one 98.08%, and there is a good adaptability and disambiguation ability of the system.

Keywords/Search Tags:

Natural Language Processing, Chinese Segmentation, Corpus, Grammar Model, Open Learning

PDF Full Text Request

Related items

1	Research And Application Of Chinese Word Segmentation Based On English-Chinese Parallel Corpus
2	The Methodology And Implementation Of Chinese Natural Language Query In Databases
3	Restricted Natural Language Query Interface Based On Semantic Dependence Grammar Analysis Model
4	Study On Chinese Word Segmentation Based On Recurrent Neural Network Language Model
5	Chinese Grammar Corpus System Design
6	A Study On Extraction Method Of Contemporary Chinese Common_used Words For Language Engineering Based On Dynamic Circulating Corpus
7	Deep Learning Based Automatic Grammer Error Correction
8	Research And Implementation Of Chinese Auto-segmentation System
9	A Technology Of Generating SQL Through Chinese Natural Language Queries Based On Deep Learning
10	Evaluating grammar formalisms for applications to natural language processing and biological sequence analysis