Font Size: a A A

The Research And Implementation Of User Interest Mining System Based On Ontology

Posted on:2014-10-05Degree:MasterType:Thesis
Country:ChinaCandidate:S Y ShenFull Text:PDF
GTID:2268330425975989Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the continuous evolving of business model of human society, many commercialcompanies hope to have some strategies that are not only consistent with their business traits,but also can provide personalized service for different customers. One of the key factors ofsuch differentiated service is user interest model.The data mining technology, which is usually based on the relationship between usersand items, captures the user interest features, and recommends the products that users may beinterested in. Some non-ecommerce companies, such as search engine manufacturers andInternet service providers, are hard to construct effective user interest model in the context ofgeneral data mining technology due to lack of user purchase records. But these companiesmay have another valuable resource—user’s browsing history. This dissertation discusses auser interest mining system that focuses on the URL data in user’s browsing history. Wepropose a novel process for building user profiles based on the reference interest ontology.Through the real user data in the experiment, we proved the feasibility and practicality of thissystem.The main research work of this dissertation can be summarized as follows:Firstly, a set of integrated and efficient training methods for the concepts in the interestontology is proposed. We utilize keywords from the predefined ontology to construct thesearch URLs for specific search engine. Our system crawls the search results as a trainingdocument set of the concepts in the ontology. The core content of every webpage is fetched bythe method which combines the extraction technology based on XPath and improved webcontent extraction algorithm based on the distribution function of line blocks. After buildingan inverted index by Lucene, the TF-IDF vector of each concept can be calculated accuratelyand efficiently.Secondly, an interest modeling method combing with the user’s browsing behavior. Userinterest model is essentially an instance of a reference ontology in which concepts areannotated. And this dissertation proposes a spreading activation algorithm integrating the user browsing pattern for initializing and updating the interest scores. Our approach, whichconsiders the hierarchical relationship among the concepts, can not only capture the explicitinterests correctly, but also predict the user’s potential interest partly. In addition, thisapproach addresses successfully the cold-start problem in general interest mining algorithm.
Keywords/Search Tags:Interest Mining, Ontology, Spreading Activation
PDF Full Text Request
Related items