Font Size: a A A

Detecting Uncertainty Information In Natural Language

Posted on:2016-10-25Degree:MasterType:Thesis
Country:ChinaCandidate:W M YangFull Text:PDF
GTID:2308330464454807Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
There exists a lot of uncertainty information and ambiguous words in natural language texts. The sentences that include hedges are considered to be uncertain. We should distinguish this uncertainty information or do some special processing from certainty information. Uncertainty information detection is determining uncertainty information from natural language texts. This is important for text information extraction. Therefore, aiming at detecting such information has become one of hot spots in information retrieval. However, uncertain information detection in Chinese is still rarely and the existed researches focused on the English texts. Therefore, this article will make use of support vector machines(SVM) model to study uncertainty information on Chinese texts and regard this model as a baseline; In addition, we also carry experiment on conditional random field model(CRF) in order to explore the uncertainty information recognition model.In this paper, we use the SVM model that has the advantage of nonlinear, high dimension, local small samples and other advantages to transform Chinese uncertainty information recognition questions as a classification problem. We construct an appropriate classifier through training on corpus published by Fudan University and then classify the new texts with the trained classifier to verify the effectiveness of the method with SVM model we proposed in our paper. But it also exist a problem which is not solved now: the kernel function and parameters lack of theoretical guidance, which limit our classification performance.In addition, we also use the CRF model to identify uncertainty information. Using the characteristic of sequence labeling of CRF, we regard the recognition of uncertainty information as the judgment of whether the node is the boundary of hedge or not. So we proposed an uncertain information recognition model based on Conditional random field. In this paper, we carry experiment by word and lexical. With the joinment of lexical, the performance of our system is improved. In addition, the experiment also train and test on different template windows that is 1, 3, 5. The result shows that the performance of the system is best when the template window is 3. Conditional Random Field model takes full advantage of contextual information along with lexical, which solves the problem due to characteristics of Chinese word to some extent. Compared to the system based on SVM model, the performance of the system based on CRF model is greatly improved. Our study can be applied to many natural language processing tasks, which can provide important resource for factual information extraction and it also provide a new way for uncertainty information recognition.
Keywords/Search Tags:uncertainty information recognition, SVM, CRF, classification
PDF Full Text Request
Related items