Font Size: a A A

Research On Shallow Linguistic Parsing And Sentence Oriented Novelty Detection

Posted on:2006-02-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:H P ZhangFull Text:PDF
GTID:1118360185995710Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The dissertation addresses the research on sentence oriented retrieval and novelty detection. It aims to meet the requirement on concise information with small granularity and less redundancy. It brings up the technique of partial linguistic parsing for novelty detection and discusses the modeling on sentence retrieval. The author also introduces various means to quantify the novelty degree on given information list. Noovel system, developed on the basis of the technique discussed here, achieves better on the final performance than any published result in the international official evaluations and several groups of experiments. It indicates that the technique applied in the system is effective for sentence retrieval and novelty detection.Shallow English language parsing customized for novelty detection includes sentence boundary detection, tokenization, part-of-speech tagging, and morphological analysis. The author modifies the previous work on related natural language processing to solve the problem of novelty detection. As for shallow Chinese language parsing, the dissertation employs on hierarchical hidden Markov model to incorporate word segmentation, part-of-speech tagging, segmentation disambiguation and unknown words recognition into a unified framework. Based on linguistic analysis results, query analysis filters supplementary words, classifies the query tendency, automatically understands the user's query intention and generates computable query vector from topic description written in natural language. Experiments on all available test data sets are made to testify the function of shallow linguistic parsing. Surprisingly, the sentence retrieval on the basis of shallow linguistic parsing has achieved the best performance.On sentence retrieval, Noovel applies three modeling approaches: vector space model, probabilistic retrieval model and language model. For the limit of a single sentence, some query expansion strategies have been tried, including semantic query expansion using WordNet, pseudo feedback, and local co-occurrence expansion. Experiments on TREC2003 data set shows that simple vector space model with shallow linguistic parsing could achieve better than any previously published result. Semantic expansion do not affect much on retrieval due to limitation on semantic resource and analysis depth, however, local co-occurrence expansion is helpful on query and document expansion.Sentence oriented novelty detection is the goal of the research. It is a temporal task. The dissertation summarizes three approaches to modeling on information novelty degree based on previous work. It includes weighted word overlapping, similarity comparison, and information increment. The motivation lies in locating new information by considering relevance to topic and comparison with history.Besides unsupervised novelty detection, the dissertation also discusses how to perform...
Keywords/Search Tags:Sentence retrieval, Novelty detection, Shallow linguistic parsing, Information retrieval, Information filtering, Query anlysis, Query expansion, Natural language parsing, Chinese word segmentation, Part-of-speech tagging, Noovel
PDF Full Text Request
Related items