Detecting Uncertainty Information In Natural Language

Posted on:2016-10-25

Degree:Master

Type:Thesis

Country:China

Candidate:W M Yang

Full Text:PDF

GTID:2308330464454807

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

There exists a lot of uncertainty information and ambiguous words in natural language texts. The sentences that include hedges are considered to be uncertain. We should distinguish this uncertainty information or do some special processing from certainty information. Uncertainty information detection is determining uncertainty information from natural language texts. This is important for text information extraction. Therefore, aiming at detecting such information has become one of hot spots in information retrieval. However, uncertain information detection in Chinese is still rarely and the existed researches focused on the English texts. Therefore, this article will make use of support vector machines(SVM) model to study uncertainty information on Chinese texts and regard this model as a baseline; In addition, we also carry experiment on conditional random field model(CRF) in order to explore the uncertainty information recognition model.In this paper, we use the SVM model that has the advantage of nonlinear, high dimension, local small samples and other advantages to transform Chinese uncertainty information recognition questions as a classification problem. We construct an appropriate classifier through training on corpus published by Fudan University and then classify the new texts with the trained classifier to verify the effectiveness of the method with SVM model we proposed in our paper. But it also exist a problem which is not solved now: the kernel function and parameters lack of theoretical guidance, which limit our classification performance.In addition, we also use the CRF model to identify uncertainty information. Using the characteristic of sequence labeling of CRF, we regard the recognition of uncertainty information as the judgment of whether the node is the boundary of hedge or not. So we proposed an uncertain information recognition model based on Conditional random field. In this paper, we carry experiment by word and lexical. With the joinment of lexical, the performance of our system is improved. In addition, the experiment also train and test on different template windows that is 1, 3, 5. The result shows that the performance of the system is best when the template window is 3. Conditional Random Field model takes full advantage of contextual information along with lexical, which solves the problem due to characteristics of Chinese word to some extent. Compared to the system based on SVM model, the performance of the system based on CRF model is greatly improved. Our study can be applied to many natural language processing tasks, which can provide important resource for factual information extraction and it also provide a new way for uncertainty information recognition.

Keywords/Search Tags:

uncertainty information recognition, SVM, CRF, classification

PDF Full Text Request

Related items

1	A Study On Sentence Uncertainty Identification And Classification
2	Cascade Classification And Multi-label Classification Used In Chinese Folk Instruments Recognition
3	Study Of Image Active Classification Method Based On Resampling Thought
4	An Possibility Fusion Method Based On The Uncertainty Of Multi-source Information
5	Multi-scale Assessment For Uncertainty Of Classification Of Remote Sensing Image Based On Information Theory And Rough Set
6	Measuring Uncertainty Of Rough Sets And Its Application In Text Classification
7	The Investigation Of Uncertainty Sources Of Immunity To Conducted Disturbances, Induced By Radio-Frequency Fields
8	Research On Radar Object Classification Algorithm Based On Kinetic Information And RCS Features
9	Adaptive Fuzzy Logic based framework for handling imprecision and uncertainty in pattern classification of bioinformatics datasets
10	Research On Fusion Approach Of Conflicting Uncertainty Information And Its Application In Fault Diagnosis