Font Size: a A A

Research On Internet Text Mining And Personalized Recommendation

Posted on:2015-02-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y WenFull Text:PDF
GTID:1488304310996379Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the development of Internet technology, the popularity of the websites and the emergence of large number of texts data, the Internet has become an important channel for people to obtain information resources. But with tens of thousands of data on Internet, it is impossible for a person to complete the exploration of the entire Internet. Thus, simplifying process of exploring the network and improve the efficiency of retrieving information on Internet have become popular research directions of the Internet age. Good information mining method can improve the efficiency of information retrieval. It can provide accurate, timely, and reliable network information collection, to provide for people to read a summary timely. Meanwhile, with the development of network technology, more and more websites appear without manual searching. These new approaches are information recommendations. At the right time to provide right relevant information or related products, it can enhance the user browsing interests and increase the viscosity of the user for the websites. The recommended method is another major information access method in the future. It has a great prospect, and has great value not only for Internet news or related texts recommendation, but also for e-commerce, promotion of the company's products and new product dissemination. In view of this, the paper combines interdisciplinary research methods, and proposes Internet hot topics detection and network auto summary generation model. The paper makes personalized recommendation algorithm based on research in user preferences and user interest. This paper focus on the fields of the Internet data acquisition, text message clustering algorithm, hot information mining, network news summarization methods, collaborative filtering recommendation algorithm and community-based recommendation.Major works and innovations of the paper include the following aspects:1) This paper has a research on the Internet text information collection and pre-process technology, Chinese word segmentation and clustering methods. And then it proposes a hotspot event discovery algorithm based on the characteristic of text information on Internet. By introducing the text word burst metric and considering influence the position of words, this paper improves the accuracy of calculating the weight values. This paper presents a reasonable division of the text theme by preset-density based maximum link clustering Algorithm and treats similar texts as the core of the clusters. So it can automatically discover the hot events of a period. Experimental results show that this algorithm has a better result in finding the internet hotspot events.2) The paper has a research on automatically generated text summaries of the Internet texts. The algorithm allows text information to be compressed, and uses abstract forms to represent text, which can provide users with quick access to the text of the main content. The algorithm analyzes the Internet news summaries information for multiple texts, and then put forward the concept of summary topics. The summary topics generate information clusters according to the results of hierarchical clustering by dividing Internet news into sentences. Secondly, the use of artificial comment of Internet news further improves the accuracy of text summarization. Text and comments statements are mapped into network nodes, and then introducing into the HITS algorithm for analysis of network node weights to calculate the different influences of location of the sentences. Comment information has an influence of the news body text. It significantly improvements the right selection of the summary by improving the weight of these statements. Experimental results show that the algorithm with use of comments is better than the algorithm without using comments. The study provides a basis for further Internet information extraction and automatically summarization.3) This paper has studied the collaborative filtering recommendation algorithm. This paper has improved the accuracy of recommendation by an improved collaborative filtering algorithm based on the conventional computing method. By considering the preferences of different users and the similarity of their respective preferences, it presents a similarity formula based on logarithm. In practical applications, it uses the real data of micro-blog to test the improved recommendation algorithm. By clustering of micro-blog to form different topic categories, it gets the relationship between users and these topics categories, and then takes advantage of the improved collaborative filtering algorithm to recommend. Experimental results show that the recommendation result can effectively hit the micro-blog data validation data set. Compared to traditional collaborative filtering algorithms, the new recommendation algorithm dramatically increased the recommendation accuracy, with better personalized recommendations effect.4) This paper has a research on the perspective of the recommendation system. It presents two different models of formation communities, and studies which recommended method is suitable under the conditions of different community formation. It proposes two suitable similarity calculation models in the community, and then compares them with the traditional similarity model and tests several similarity calculation models under the conditions of different community formations. Measured in Movielens dataset to verify that the model based on the formation of communities is better than traditional heat conduction model and probabilistic transmission model not only in terms of the accuracy of the recommendation but also in the diversity of recommendation. At last it compares two models of forming communities and finds that for non-strict division of community model has a higher accuracy and diversity of recommendation, compared with the strict division of community model. Thus, the non-strictly divided communities' model is more suitable for recommendation system, especially for the personalized recommendation.
Keywords/Search Tags:Topic Discovery, Automatically Summarization, Clustering Algorithm, Collaborative Filtering, Personalized Recommendations
PDF Full Text Request
Related items