Font Size: a A A

The Research Of Semantic Measurement In Text Information Retrieval

Posted on:2013-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:L P YiFull Text:PDF
GTID:2248330371970891Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Information Retrieval is a technology, which is generated with the development of science technology and the surge in the amount of information and plays an increasingly important role in people’s work and life. The information that is used in our daily life is mostly represented in text format, so the text information retrieval is the mostly common used. When facing the large amount of information, how to retrieval the information that is met the users’demands is an important thing.When querying text information, the first thing is to choose the appropriate data model which is used to abstract text information. In this paper, we use the Vector Space Model to extract the text information feature vector, so when matching the text similarity, we can compare the similarity by choosing the text feature vector.When matching the similarity of two text feature vectors, this paper uses a measure based on semantic-diffusion maps. By this way, high-dimensional text feature vector can be reduced in low-dimensional data space. The diffusion distance in the diffusion process can be maintained between the invariance of data semantics. This paper defines a two-value method for the data processed by diffusion maps, if the p-th bit of the vector’s value is greater than or equals to the p-th bit of average of all the vectors’ values, then its value is+1, else its value is-1.When retrieving the text, for those query documents, according to just gotten the two-value vector, this method trains classifiers, and then classifies query documents using classifiers. Therefore, the query documents can be represented by two-value vector. Though this method, we can speed up file retrieval rate.In this paper, we use Reuters21578,20Newsgroups, TDT2 as test dataset and validate the performance of the proposed method—Diffusion Maps—Support Vector Machine. The experimental results show that the method has high search efficiency.
Keywords/Search Tags:Text Information Retrieval, Semantic Measurement, Diffusion Maps, Supported Vector Machine
PDF Full Text Request
Related items