Font Size: a A A

A Improved Text Similarity Model Based On PageRank Value

Posted on:2011-07-15Degree:MasterType:Thesis
Country:ChinaCandidate:H TianFull Text:PDF
GTID:2178360305988611Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As one of the core technology of the Internet, Search engines has made great contributions to the development of Internet. The purpose of search engine users is to provide efficient search results, that is, allow users to faster, more comprehensive, more steady, more accurate from the complex World Wide Web to find the information they need. The Internet is changing rapidly, technology in Search engines must be constantly developed in order to meet the changing needs of customers.This paper improved TF/IDF which has been widely used in the Vector Space Model (VSM),and proposed a new method which uses PageRank Value in Text Classification.The new method named as "A improved text similarity model based on PageRank value".The main research of this article includes the following four points:1. Taking into account the special circumstances of the network,we improved the statistical methods for word frequency (TF method), so that the word frequency can be better for the retrieval service.2. Improved the calculation method of Inverse text frequency(IDF method), considering the impact of different text types in calculating the Inverse text frequency,so that the final extracted information more valuable.3. Combination of improved TF method and IDF method Improving vector similarity model.4. Verified the improved model Vector similarity After a large number of experimental data analysis found that the improved model can be contributed to the quality of retrieval efficiency.First of all,the improved model put the text into classification preliminarily,and then considering the different types of information use improved VSM model to sort the text which have been classificationed In order to make the improve method applied into practice easily This paper presents a seamless structure to convergence the original system whice use of middleware,and design the related middleware—User Interface.In the experimental stage, steps are following:first of all,searching the artificial retrieval library and taking statistics of the results.secondly,using the improved method to search the results secondary. Finally, compareing and analyzing the two search results.Experimental analysis are focusing on relevance, excellent rates and new word accuracy rate. Experimental results show that:the improved model can improve the retrieval effectiveness,which could enable users to find the content they need more easily.
Keywords/Search Tags:Search Engine, Vector Similarity Model(VSM), TF/IDF
PDF Full Text Request
Related items