Font Size: a A A

A Design And Application Of Personalized Information Retrieve And User Recommendation On Search Engine

Posted on:2012-04-15Degree:MasterType:Thesis
Country:ChinaCandidate:X L XuFull Text:PDF
GTID:2218330338974410Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the exponentially growing amount of information available on the Internet, the task of retrieving interested documents has become increasingly difficult. Currently search engine is a convenient and efficient way to obtain information for most of Internet users. The traditional search engine, such as Google, yahoo, baidu and etc. developed rapidly. Until June 2010, the number of internet users in China has been grown up to 400 million, over 75 percent of which are utilizing search engine. There is a much larger rate at abroad. Thus, search engine market is promising.The existing majority of search engine have the shortcomings of providing the same results to the different user's retrieval requisition, failing to reflect the user's true interest. Actually, because of the difference in age, gender, education background, and specialized field, the concern behind same key word may verify from different users. Personalized search engine tries to design the user interest model through analyzing the documents structure, the user's behavior information as well as the user's evaluation towards the documents to guide the search engine to retrieve the result and sort it so as to meet the needs of every user.In this paper there are three functions implemented as following: 1,Chinese word segmentation technology; 2,user recommendation function; 3,personalized search, which return different search result to different user though they input same query term.Chinese word segmentation belongs to natural language processing technology, which is the basis of searching and indexing operation in search engine. Popular Chinese word segmentation algorithms are based on dictionary, statistical and principle. The design concept of dictionary-based algorithm which is widely used by most programmers is simple. But there lies two difficulties: elimination of ambiguous word and identification of unknown word. They are still open problems even with the efforts of many scholars. The segmentation algorithms mentioned in this paper enhance the segmentation accuracy by doing some improvements to the existing segmentation algorithms.When retrieving information interested, the input query is the main medium between search engine and users. The accuracy and frequency of query term directly determine the efficiency when retrieving precise and comprehensive information. In most cases, users may not always express their requirement accurately. By the recommendation of other users'description to the same problem, the suggestion function and relevant search can provide them remarkable help. It contributes to fix the query content, thereby allow users to obtain information faster and more suitably. Adding the user recommend function to a traditional search engine, which provides users others'similar description of same question for reference when inputting the query, is a great step forward for search engine.Personalized retrieve expresses and manages users'interests by mining users'interests'information. Through constant updates and maintenance, optimized model can gradually provides accurate reflect according to the users'interests and needs, as well as the personality analysis for further work. These data include the input key words, the click behaviors in the result web pages, the skip behaviors between different web-pages and the bookmarks. When users query new key words, according to these user information, the more customizing search results will be taken back and improve the user experience.The innovations of this paper are as follows:1) In this paper, it used a typical Chinese word segmentation algorithm based on dictionaries which include the commonly used Chinese words 119803 and 1015 comouter terms. And it also had a new word dictionary. The new dictionary is active learning through computer programs, and constantly adding unknown words, new words. The segmentation algorithm in this paper improved the speed and the accuracy of segmentation compared to the existing algorithm.2) To achieve personalized search technology based on the universal search engine and add user interest model into it. Building up user interest model to enhance the accuracy and scope of information process is by exploring users'browsing history or their browsing activities through web-log mining technology, searching similar users and using their search results or similar users' interested spots which is based on users'community.
Keywords/Search Tags:search engine, user recommendation, Chinese word segmentation, Lucene, Ajax
PDF Full Text Request
Related items