Font Size: a A A

Intelligent User Interests Modeling Method Based On The Hybrid Clustering Algorithm

Posted on:2009-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y TianFull Text:PDF
GTID:2178360272474290Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the past decade, the information on Internet increased by degrees of index number. The netizens found it more and more difficult to get valuable information and materials that they want very quickly. The huge amount of Internet information resulted in the problem of information overloading and information-mazing, but what the Internet users need is to find the proper materials very soon, without wasting much time on searching. So, personalized service, which can solve this contradict to a certain extent, has now became a hot research field. Many commercial companies, such as Google, etc, claim that the next generation's World Wide Web must be personalized and intelligent. While, the User Interests Modeling (UIM) technology, which is the core of a system that provides personalized service, decides the quality of the User Profile. And furthermore it's the key factor that whether a system can provide good personalized service to the netizens. Thus, this paper makes some deep researches on UIM technology and the main contributions of it include such aspects as below:Firstly, when clustering a set of Web documents, most of the existing clustering algorithms need to be specified artificially a special k value, the number of clusters. Based on a technique of"auto-selected similarity threshold", this paper proposes an"auto k value calculation"method to calculate automatically the k value. With this method, the program can determine how many categories of Web documents a certain user is interested in.Secondly, to find out which classes of documents a certain user is interested in automatically and precisely, this paper brings up a hybrid Web documents clustering method based on k-means, genetic algorithm and ISODATA:①To overcome the disadvantages of the traditional k-means algorithm, which might reach a local optimum and be sensitive to the initial k representative centers, and meanwhile to keep the characteristic of its fast convergence, a clustering method combining GA with k-means is designed in this paper. GA's global search ability is generally accepted and it can cover the shortage of k-means; at the same time, k-means can hasten the convergence speed of GA;②by borrowing some ideas from ISODATA, during the iterations of the algorithm, this paper makes some dynamic splitting and merging operations on the clusters, making the improved clustering algorithm have the self-adjusting ability to discover clusters of different sizes. Thirdly, the hybrid clustering algorithm is applied to get the initial k partitions of the whole Web documents that a certain user has browsed and these initial clusters are agglomerated hierarchically according to the"similarity between classes"until all of the documents are clustered into a single cluster. The final result is the tree-structure classification of all the Web documents. Then the Interest Degree values of the clusters are calculated according to the browsing behavior of the user. By now, the User Profile is constructed.Finally, the experiments are done. It's proved that: the"auto k value calculation"method is competent for solving the threshold specification problem though the results may not be very precise sometimes; and the improved hybrid clustering algorithm has better clustering result, with better F-measure value of the clusters returned.The User Interests Modeling technology raised by this paper can be applied in many personalized service fields, such as Personalized Recommendation, Personalized Search Engine, and so on. It has both good academic value and good application value in many domains.
Keywords/Search Tags:Personalized Service, User Interests Modeling, Partitioning-based, Web Documents Clustering, Genetic Algorithm
PDF Full Text Request
Related items