Font Size: a A A

Full Text Retrieval Research Based On The Feature Extraction And Conversion Method

Posted on:2015-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:F L HeFull Text:PDF
GTID:2268330428967675Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information retrieval technology develops more perfect with the rapid development of the Internet, at the same time; the search engine has become an important and indispensable tool during people’s daily lives. The traditional retrieval method is based on the keywords, which only through match the same keywords to search for relevant documents. It often brings a wrong way on a semantic understanding and this approach has been increasingly unable to meet the user’s needs and the status of scientific research. Thus, it becomes more and more important to try to dig out the deep semantic information in retrieval.Because of the ambiguity and relevance of the Chinese language, resulting the uncertainty and ambiguity in natural language at some time, the latent semantic analysis method is widely used in the field of information retrieval, the core of latent semantic analysis is to establish a weighted matrix based on the words and documents,then make the conversion on the matrix, the function for calculating the weighting provided by the latent semantic analysis method has a direct impact on the results. Such a semantic matrix established based on the relationship between word and word, largely eliminated the diversity and randomness which may lead to deviation in the search results. However, the method is still ignoring the ambiguity and uncertainty of language, so, the cloud model theory is introduced into information retrieval, trying to dig out some potential semantic information.Latent Dirichlet Allocation (LDA) Model is used to mine the underlying topical structure. Each topic is associated with a multinomial distribution over words which are semantic related. But there is doubt that themes are relevant with each other in the light of semantics. It introduces Cloud Model theory into LDA Model and builds a new feature selection system. Because of the relationship of the Mean and Variance of the cloud model, it will add a subject on the theme as a regulator when marks a topic during the sampling. So the new method can extract the feature set of the text which has a high contribution. Results show this feature set has less features but higher classification accuracy.Words have different representation of semantic information; the information can not be integrated directly in two kinds of semantic space. It presents a feature conversion mechanism, which converts the two kinds of semantic information in the cloud space so that they are consistent. Then making a further integration in the uniform space, and selecting a sample on label topic model. Put the integration of the two kinds of semantic information into information retrieval through query expansion to improve the retrieval results.
Keywords/Search Tags:Information Retrieval, Topic Model, Cloud Model, Feature Item, ConceptLabels, Relevancy
PDF Full Text Request
Related items