Full Text Retrieval Research Based On The Feature Extraction And Conversion Method

Posted on:2015-03-07

Degree:Master

Type:Thesis

Country:China

Candidate:F L He

Full Text:PDF

GTID:2268330428967675

Subject:Computer application technology

Abstract/Summary:

Information retrieval technology develops more perfect with the rapid development of the Internet, at the same time; the search engine has become an important and indispensable tool during peopleâ€™s daily lives. The traditional retrieval method is based on the keywords, which only through match the same keywords to search for relevant documents. It often brings a wrong way on a semantic understanding and this approach has been increasingly unable to meet the userâ€™s needs and the status of scientific research. Thus, it becomes more and more important to try to dig out the deep semantic information in retrieval.Because of the ambiguity and relevance of the Chinese language, resulting the uncertainty and ambiguity in natural language at some time, the latent semantic analysis method is widely used in the field of information retrieval, the core of latent semantic analysis is to establish a weighted matrix based on the words and documents,then make the conversion on the matrix, the function for calculating the weighting provided by the latent semantic analysis method has a direct impact on the results. Such a semantic matrix established based on the relationship between word and word, largely eliminated the diversity and randomness which may lead to deviation in the search results. However, the method is still ignoring the ambiguity and uncertainty of language, so, the cloud model theory is introduced into information retrieval, trying to dig out some potential semantic information.Latent Dirichlet Allocation (LDA) Model is used to mine the underlying topical structure. Each topic is associated with a multinomial distribution over words which are semantic related. But there is doubt that themes are relevant with each other in the light of semantics. It introduces Cloud Model theory into LDA Model and builds a new feature selection system. Because of the relationship of the Mean and Variance of the cloud model, it will add a subject on the theme as a regulator when marks a topic during the sampling. So the new method can extract the feature set of the text which has a high contribution. Results show this feature set has less features but higher classification accuracy.Words have different representation of semantic information; the information can not be integrated directly in two kinds of semantic space. It presents a feature conversion mechanism, which converts the two kinds of semantic information in the cloud space so that they are consistent. Then making a further integration in the uniform space, and selecting a sample on label topic model. Put the integration of the two kinds of semantic information into information retrieval through query expansion to improve the retrieval results.

Keywords/Search Tags:

Information Retrieval, Topic Model, Cloud Model, Feature Item, ConceptLabels, Relevancy

Related items

1	Research On Model Of Hot Topic Opinion Mining In Virtual Communities
2	Research On Techniques Of Text Retrieval Modelbased On Semantic Analysis
3	Research On Private Data Retrieval Based On Topic Model In Cloud Storage
4	Cloud Image Retrieval Method Base On Topic Model
5	Research On The Music Classification Method Based On Correlated Topic Model
6	Research And Implementation Of The Topic Web Crawlers
7	Research On Requirements Trace Links Generation Method Based On Hybrid Information Retrieval Model
8	Research And Implementation Of Key Technology Of Data API Retrieval Platform Based On Topic
9	Research On Bilingual Topic Model And Its Algorithm In Cross-language Information Retrieval
10	Research And Application Of 3D CAD Model Retrieval Technology Based On Feature Information