Font Size: a A A

Keyphrase Extraction Using Phrase Co-Occurrences

Posted on:2016-06-26Degree:MasterType:Thesis
Country:ChinaCandidate:J B GuoFull Text:PDF
GTID:2308330473957036Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, vast amounts of data arise on the Internet. We need big data for batch processing or analysis of large data sets. The unit of data has changed from TB to PB, even to DB. Meanwhile, big data also involve fast data processing speed. Therefore, it has become very important to access and process critical information effectively and fast. In this thesis, critical information means keywords or key phrases of a document. Keywords or keyphrases can effectively summarize the central idea of the document. Meanwhile, with the explosion of portals, how to get interesting news from a mass of documents is also a hot issue. The focus of this thesis is two-fold. One is to extract keyphrases from documents. The other is to recommend news using the extracted keyphrases.(1) A keyphrase extraction algorithm is been proposed. The algorithm can extract keyphrases directly from a document, which is different from the supervised approach that needs a training set. In a supervised keyphrase extraction algorithm, the classifier is constructed using the training set. Then the classifier is adopted to extract keyphrases. Our proposed algorithm consists of three steps:candidate identification, weight calculation for each candidate, and keyphrase selection. We select high-quality candidate phrases and use efficient features to calculate the weights of phrases. Experiments are conducted to validate the effectiveness and efficiency of the proposed keyphrase extraction method. We have also designed a prototype system with the keyphrase extraction algorithm.(2) A personalized news recommendation algorithm based on domain ontologies is been proposed. The ontology library of news areas is used to calculate the similarity between keyphrases and user interests, and according to the level of similarity, the proposed algorithm recommends news to the users. Meanwhile, the algorithm updates the interests of the users based on their browsing histories.
Keywords/Search Tags:Data Mining, Keyphrase Extraction, News Recommendation, Phrase Co-occurrence, Text Mining
PDF Full Text Request
Related items