Information Retrieval Oriented Analysis Of Text Content

Posted on:2008-09-07

Degree:Doctor

Type:Dissertation

Country:China

Candidate:Y Hu

Full Text:PDF

GTID:1118360242476098

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Information retrieval is the important and key problem in information service. It is the measure for people when facing the "information explosion". The research on how toautomatically and effectively organize information and search information has very high values of theory and practice for using large scale of information. The retrieval research includes retrieval model, information processing and its applications. This thesis present several methods for these problems, respectively, and the object processed in these studies is text data. First, a recursive conceptual graph based retrieval model is presented in this thesis. Second, an approach to extracting the conceptual (attribute, value) structure oriented knowledge from a machine readable dictionary is explored, and a method for automatically constructing the relations labeled by attribute names between concepts in unstructured texts is proposed. At last, this thesis explores the text clustering and sentiment analysis for textinformation processing.Concretely to say, this thesis makes the contributions in below for information retrieval:(1) A recursive conceptual graph formalism is presented to describe the meaning ofdocument contents and users' queries in a specified domain. This formalism is defined based on the (attribute, value) structure. It expects using nested conceptual graphs corresponding to the combination of syntactic parts to implement the mapping from syntactic structure to semantic structure. This kind of parallelism could make the synchronization between semantic analysis and syntactic analysis in future. Based on this recursive style, this thesis indexes some documents and queries, and proposes a new comparison algorithm betweengraphs to address the relativity issue.(2) A Chinese machine-readable dictionary is exploited to extract the conceptual knowledge, i.e. the (attribute, value) structure from the corresponding definitions of nominal entries. By comparing the previous work of acquiring word knowledge from free texts and dictionaries, it finds that a dictionary is an advantaged resource for extracting discriminative knowledge of concepts. Our method focuses on constructing the attribute-value extracting patterns and the statistical decision for applying these patterns. Therefore the work is designed to be a new three-step procedure that is different from previous dictionary extracting studies which parse the definitions first. (3)To serve the conceptual graph based retrieval, a bootstrapping method for automatically extracting semantic patterns from a large-scale corpus to identify three relations between Chinese concepts in contexts is explored in this thesis. Our contributions different from other bootstrapping methods lie in introducing a bi-sequence alignment algorithm from bio-informatics to generate candidate patterns, and giving a new evaluating metric for patterns' confidence to enhance their extracting qualities in next iteration. In terms of automatic recognition of these three relations, the experiments show that the pattern set generated by our method achieves higher coverage and precision than DIPRE does.(4)In this thesis, a new similarity of text on the basis of combining cosine measure with the quantified conceptual relations by linear interpolation for text clustering is presented. These relations derive from the entries and the words in their definitions in a dictionary, which are quantified under the assumption that a entries and its definition are equivalent in meaning. This kind of relations is regarded as "knowledge" for text clustering. Under the framework of k-means algorithm, the new interpolated similarity improves the performance of clustering system significantly in terms of optimizing hard and soft criterion functions. The results show that introducing the conceptual relations from the un-structured dictionary into the similarity measure could provide contributions for text clustering.(5)This thesis presents a generative model based on the language modeling approach for sentiment analysis. By characterizing the semantic orientation of documents as "favorable" or "unfavorable", this method captures the subtle information needed in text retrieval. In order to conduct this research, this thesis explores the global and local language modeling approaches, respectively. It uses Kullback-Leibler divergence between the language model estimated from test document and the two trained sentiment models for global language modeling, and uses the dependent linkages between a domain "term" and other ordinary words in the contexts by exploiting a triggered language model for the local analysis. The better results motivate us to consider finding more suitable language models for sentiment detection in future research.

Keywords/Search Tags:

retrieval model, dictionary extraction, conceptual relation construction, text clustering, sentiment analysis

PDF Full Text Request

Related items

1	Research Of Conceptual Relation Extraction Based On Topic-Text Paragraph
2	Research On The Self-construction Method Of Emotional Dictionary Oriented To Specific Topics In Social Networks
3	News-text Sentiment Classification Based On Jst Model
4	Research On The Construction Of Sentiment Analysis Model Of Long Text Driven By Reading Context
5	Research On Text Sentiment Analysis Combining Sentiment Dictionary And Neural Network
6	The Research On Construction Of Conceptual Network From Dictionary
7	Conceptual Graph Based Text Retrieval In Specified Domain
8	Research And Application Of Person Figure Mining Based On Text Analysis
9	Research On Chinese Short-text Sentiment Analysis
10	Research On Short Text Sentiment Analysis Technology Based On Extended Sentiment Dictionary