Research And Implementation Of Technology News Service Based On Information Aggregation

Posted on:2017-03-17

Degree:Master

Type:Thesis

Country:China

Candidate:D Jiang

Full Text:PDF

GTID:2308330485453752

Subject:Control Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the increase of Internet media and users sharing channels, the massive growth of information caused a serious problem of information overload. At this time, compared to traditional information aggregation which focuses on how to provide more rich resources, information screening and filtering has become even more valuable technology. How to help users get their really interested information, to improve the efficiency of user learning knowledge becomes the new challenges of the information aggregation techniques.To relieve the information overload problem in technology news service, this dissertation explores the information screening and filtering technology based on text mining methods. Based on sentence semantic similarity calculation methods, propose duplication detection and text clustering algorithms which combine with semantic features, and apply them to eliminating news duplication, mining public hot spots, positioning user interest topics precisely. In detail, these works and achievements include:1. Propose a short-text duplication detection method based on semantics. For the information redundancy problem in news aggregation, we propose a news duplication detection algorithm which can detect not only the literally duplicate and near duplicate news but also the "topic-duplicate" news reporting the same event. The general methods for calculating sentence semantic similarity are discussed first, and we imporve the sentence similarity calculation methods based on Word Embedding word vectors. Then we apply sentence senmatic similarity calculation to measuring the topic similarity of news. Experiments show that our algorithm can improve greatly in recall rate compared to traditional algorithm which is merely syntax based, under the condition of keeping a high precision. Thus the algorithm is capable of removing the redundancy of news aggregation to a greater extent.2. Propose a text clustering algorithm based on semantics and graphs. Traditional text clustering algorithms often use the Bag-of-words model to construct the vectors of documents, ignoring the semantic information between words; and partitioning clustering methods based on centroid tend to split concept closely related clusters stiffly. Through the integration of semantic models of word vector and graph clustering algorithms which can dig strongly connected natural clusters, we propose a short text clustering algorithm, to make up for the shortcomings of traditional algorithms. Through human evaluations on 21 clusters in the experiment we find that the new algorithm can capture topic information better and show higher clustering purity than the traditional k-means method, so it is more qualified for the news topics mining task.3. With the above algorithms, we build the "Technology Vision" news service system, which can compact news aggregation results, and improve user experiences. This system has been put into the Android Application Market and runs stably.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research On Semantic Representation Of Text Based On Topic Model
2	Text Classification Based On Word Vector And Topic Vector
3	The Research On Short Text Semantic Mining Based On Topic Model And Word Vector
4	Research On Text Clustering Algorithm Based On Word Frequency And Semantic
5	Hot Topic Discovery And The Application Of Word Cloud Based On Voronoi
6	Research On Text Semantic Mining Based On Topic Model And Paragraph Vector
7	Research On Short Text Topic Information Mining Technology
8	Research On Hot Topic Detection Methods For Microblog
9	Study On The Chinese Text Clustering Algorithm Based On Semantic Similarity
10	Research On The Calculation Method Of Han-Thai Bilingual News Text Similarity With News Elements