Research On Directional Acquisition And Automatic Summarization For Network Data

Posted on:2019-11-16

Degree:Master

Type:Thesis

Country:China

Candidate:C C Yang

Full Text:PDF

GTID:2428330566999350

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

The rapid popularization of technologies such as the Internet and Internet of things has led to the rapid growth of data on the Internet platforms.The accurate directional network data acquisition is important for the data mining.However,there is the problem of low acquisition precision.In addition,in the face of the massive acquisition of information content,how to get valuable information from these contents is also the research point.The traditional summary algorithm only considers the frequency of the keywords and the related semantic analysis.Therefore,the document summary has the disadvantages of low precision and low recall.Therefore,this thesis mainly studies two aspects: on the one hand,improves the accuracy of data acquisition and extracts the key information as the summary.The work of this thesis is as follows:(1)In the aspect of directional data acquisition,this thesis proposes an Adaptive Crawling Algorithm(ACA)for network data.The adaptive algorithm introduced the method of text weighting to set the weights for the keywords,and calculated the relevance of the web pages based on the space vector model.The importance of web pages is judged by the relevance of links and topics,set the fitness function to filter the web pages related to the topic,and adjust the system model dynamically according to the real-time web page acquisition.This thesis is based on Hadoop distributed platform,parallel acquisition page.Make full use of the computing resources of each node to improve the acquisition rate.(2)In summary generation,this thesis proposes a multi-document Summarization Algorithm based on Topic Clustering(MDSTC)based on topic clustering.Firstly,the algorithm adds the sample density function into the clustering algorithm;according to the statistical information determine the initial number of clusters and the cluster center automatically.Then the system discovers the number of potential corresponding subtopics in the document set.The convolutional neural network algorithm is used to train the clustered topic texts,score and mark the sentences,extract the center sentences with higher relevancy from different subtopics as the summary.(3)At the end of this article,we build a prototype system to collect the "Earthquake" information and display the collected contents through the web pages.And the use of automatic summary module to gather the massive content condensed into valuable summary.

Keywords/Search Tags:

Adaptive Algorithm, Data Collecting, Nutch, Distributed Platform, Automatic Summary, Clustering Algorithm, Convolution Neural Network

PDF Full Text Request

Related items

1	An Adaptive Convolution Kernel-based Neural Network Algorithm
2	The Research On Optimization Of Convolution Neural Network Parallel Algorithm Based On Distributed Environment
3	Research On Image Algorithm Based On Convolutional Neural Network
4	Implementation And Optimization Of Convolution Neural Network Library On Sunway Platform
5	Automatic Code Summarization Algorithm Based On Gated Convolutional Neural Network
6	Research On Classification Of Freshwater Algae Images Based On Improved Convolution Neural Network
7	Clustering Algorithm In Data Mining Research
8	Based On CEMs Clustering Analysis And Semantic Ombination Research On Automatic Evaluation Method Of Programesearch
9	The Research And Implementation Of Distributed Clustering Algorithm On Adaptive Technique And Service Message Bus
10	SAR Image Change Detection Based On Self-supervised Learning Algorithm