Font Size: a A A

The Research Of The Personalized Search Engine

Posted on:2011-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:X G GuoFull Text:PDF
GTID:2178360305955065Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and the Internet, information on the network has the explosive growth. Because of the enormous amount of information, Web has become an important way for people to access information. At the same time, because the content of Web resources is complex and the organization of Web is incompact, in order to find the information they need, the users found they need an information retrieval systems, search engines have emerged, and has become an important online tool for retrieving information. However, with the rapid development of technology, now, the search engines are increasingly unable to make fully satisfied with the user, such as traditional search engines only rely on keyword matching and don't take into account of the differences of the user. The traditional search engines return the same result to users. Also, there are some other issues made the user can not fully satisfied. In this paper, author points out some of the shortages of the traditional search engine. Based on the existing research, author researches the personalized Web search engine and gives some corresponding solutions.Personalized search engine should need to have the personalization features that can be based on the user's background, the user's hobbies, and retrieval tasks and purpose, search history, as well as user preferences for search results. Relying on the features, the personalized search engine could provide to each user his own information retrieval environment. At the same time, each different user has their own favorite about the interface, and each user like to use their own familiar way to express his needs for the information. Therefore, personalized Web search engine should provide targeted assistance information to different users, and should meet their individual requirements, so that users can compare the satisfaction of the query results.In this paper, author researches some key technologies of personalized Web search engine due to the uniqueness of the personalized Web search engine. First, author researches automatic Chinese word segmentation technology and user interest mining technologies, After the author make further study on MM algorithms, the author pointed out some problems and put forward an improved automatic Chinese word segmentation algorithm. After the author makes further study on the standard PageRank algorithm, the author proposes an algorithm which is based on a weighted value based on modified pages of personalized PageRank Algorithm. Based on these studies, the author achieve a personalized search prototype system MySearch using of PHP and MySQL.In this paper, the author describes in detail the personalized web search engine, key technologies of automatic Chinese word segmentation technology and user interest mining. When introducing the automatic Chinese word segmentation technology, the author first introduced the commonly used automatic word segmentation of Chinese technology. After the author make further study on MM algorithms, the author pointed out some problems and put forward an improved automatic Chinese word segmentation algorithm. Compared with SimpleAnalyzer and StandardAnalyzer segmentation algorithm, the segmentation algorithm proposed in this article has been greatly improved by 3 to 4 times in time efficiency. Also the author research the technology of mining user interest, the author introduced the concept of user interest excavation and study of user identification and access to user interest approach.Also, the author describe in detail the Web page weight analysis the key technologies of personalized web search engine, in the fourth chapter of this article. the author make a detailed study of the PageRank algorithm, and make a introduction of the standard PageRank algorithm, then the author pointed out a optimization PageRank algorithm step by step. And then study the changes based on the weight of the personalized web page PageRank algorithm. The result shows that, according to the user's own feature and browsing records, we can give the PageRank algorithm different parameter. So the result is different and the order of the page is different, which can be part of the personalized information.Base on the previous study, the author realize a MySearch search system. First, the author first introduced several major modules of the system, including the web crawler module , user information module, Web page analysis module and user interface module, and demonstrated the function of the realization of part of the system interface. The framework necessary addition to the general search engines outside the module also includes two parts: (1) the user personalization feature recognition module. The module mainly use is to use a search engine from the existing server-side log information in the user characteristic information dug up to express the user's individual information needs. In the actual system, this module eliminates the need in real time, generally once a regular basis, through the latest log data to get the best reflects the needs of the user's current personal characteristic information. (2) Personalized PageRank calculation module. Personalized PageRank calculation method is a more simple design of personalized search engine, you can use the method based on modified the weights of pages of personalized PageRank put forwarded in this paper.The research of the personalized search engine has an important meaning, and is the trend of the future development. In this paper, the author try my best to realize an integrate personalized search engine. The design idea of the system mainly based on two parts: First, try to minimize the complexity of users as possible and allow users to complete the process without attention to individual circumstances. Second, try to make improvement on the basis of the existing search engine technology as far as possible and don't make greater adjustments about the technology and environment.In the progressive of realizing the system, the author summed up some problems in this system, such as: an important way to access to the user personalization model is using the server-side log information from the server. But this way is not a completely accurate mapping operation, because there are limitations in the limitations. So a more effective design and study excavated from the log the way the user personalized information is an effective improvement in the future direction.Due to the ambiguity of natural language and multi-meaning nature, only using keywords to express the user's individual requirements is incomplete. Therefore, design and research more effective keyword semantic analysis technology is also coming to work in one direction. In this paper, the author presents a web page based on the weight changes the value of personalized PageRank algorithm, but the method requires full access to log information users access the network, due to the limitations of the log information conduce the method may have some errors. Therefore, the research and design more effective personalized PageRank algorithm is also in the future to work in one direction. In order to take advantage of this framework to achieve personalized search engine which can be used by commerce, implementation complexity are relatively large, and must deal with massive amounts of data effectively address the potential scaling issues.
Keywords/Search Tags:Search engine, personalization, web spider, web mining, automatic segmentation, mining user interest, web weight analysis
PDF Full Text Request
Related items