Font Size: a A A

Design And Implementation Of Technical Article Personalized Recommendation System

Posted on:2018-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:H ZhengFull Text:PDF
GTID:2348330518492156Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The rapidly developed and highly popularized Internet has become one of the main channels for information dissemination and acquisition. The data on the network is rapidly expanding with the advent of big data era,and faced with the massive information on the Internet.users need to cost more so as to acquire valuable information. The personalized recommendation technology is utilized to solve the information overload problem.Recently, as portal websites and information websites in various fields continue to cemerge, network reading, in which the personalized recommendation technology is also needed, has gradually become one of the most popular Internet demands, and thus a individualized reading application market has been spawned.Most content-based recommendation systems use the relatively simple vector space model, but this scheme fails to solve the problems of "polysemy" and "synonymy". In order to solve the shortcomings of vector space model, this thesis applies topic model to individualized reading recommendation and text classification, and designs and implements a technical article personalized recommendation system targeted at developers, solving developers' personalized reading demands and having a very good application prospect and great commercial values.The main work completed in this thesis is listed as follows:1. The Web Crawler and Data Preprocessing for Technical ArticlesIn order to acquire recommendation objects, a stand-alone multithread crawler is first achieved by means of WebMagic crawler framework; the articles on technical websites are downloaded to file server and database; the main body part of a webpage is extracted through technology such as HTML Parser, XPath and CSS selectors, and the irrelevant term elements in the main body are removed; and finally, text processing such as word segmentation, part-of-speech tagging and stop-words removal is performed on the main body part by using FNLP toolkit for natural language processing.2. The Personalized Recommendation Based on User Interests and the Similar Recommendation Based on the Topics of ArticlesThis thesis implements the personalized recommendation based on user interests so as to recommend the technical articles that may be of interest to users. (1) finding the optimal number of topics; (2) establishing an LDA model for preprocessed technical articles and conducting feature representation on technical articles by using topics; (3) building a user interest model for each user by means of Logistic Regression according to user's historical behavior data; (4) inferring the topic distribution of new technical articles by using the trained LDA model; (5) using the topic distribution of technical articles and the user interest model to perform personalized calculation and generate a user personalized recommendation list. Through the comparison experiments with the recommendation method based on the combination of vector space model and TF-IDF, it is proved that the recommendation method based on LDA model has better recommendation effects.This thesis implements the similar recommendation based on the topics of articles so as to recommend the technical articles that are similar in the topic level to users. By using the Hellinger distance as a similarity measure, the Top-3 technical articles are taken as the similar articles of target articles.3. Browsing in Accordance with the Categories of Technical articlesTo achieve the function of browsing articles according to categories in the system, this thesis needs to predict the technical categories of articles. (1) extracting Top-N terms under all the topics in the LDA model and combining them into a feature dictionary; (2) conducting text preprocessing on training data, collecting the information of all the terms and calculating the TF-IDF values of all the terms; (3) checking all the terms in each document, using the term contained in the feature dictionary as a feature term, using the TF-IDF value as the feature weight of the feature term, mapping the document into a feature vector and carrying out normalization on all the feature vectors to get a training set; (4) looking for optimal parameters by means of the script provided by LIBSVM and then training a support vector machine classifier; and (5) using the support vector machine classifier to implement category prediction on the technical articles of unknown categories.Through the comparison experiments with the other three commonly used feature selection methods, it show that the text classification scheme in this thesis has better classification effects4. The Web Interactive SystemA Web site prototype that interacts with users is designed and realized. Users can browse the recommendation list generated by the recommendation system through the site,and at the same time, the site may also track records of users' browsing and click data.
Keywords/Search Tags:technical article, personalized recommendation system, LDA model, text classification
PDF Full Text Request
Related items