Automatic Summarization Alorgithm For Chiness Short Text

Posted on:2018-09-23

Degree:Master

Type:Thesis

Country:China

Candidate:Y Cui

Full Text:PDF

GTID:2348330542468709

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Social media platforms,such as Weibo and Twitter,have been attracting a large number of users to release and share information,because they has the advantages of easy operation,convenient interaction,rich topics and real-time updates,etc.As a result,they not only become one of the main channels for users to obtain information,but also provide useful data to help businesses to make decision and seize the opportunities.In order to improve the comprehensiveness and diversity of information acquired,short text automatic summarization technology becomes one of the key technologies to solve the problem.This thesis focuses on the excerpt summarization technology for Chinese short text.Considering the characteristics of short text and the advantages of text summarization technology based on clustering synthetically,this thesis proposes an automatic summarization algorithm for short text of social network.It makes sure that the summarization can filter the redundant information and content noise effectively,and reflect the key information of the all sides of the whole dataset.The summarization extracted is helpful for enterprises to make decisions and government to carry out public opinion control work,which has practical significance^[1].Firstly,considering the characteristics of short text which have short length,sparse feature and lack of context semantic,the semantic information of words must be extended,so this thesis proposes to obtain word embeddings by training Word2Vec model.More important,the words embeddings still have the semantic relation through the arithmetic operations.So,the processing of the short text can be simplified to the operation between words embeddings corresponding to words in the short text.Secondly,in order to calculate the weight of words,this thesis proposes three main influencing factors,such as the frequency,the left and right entropy and the coverage of the words,then constructs the influence transfer matrix and redesigns the method to calculate the weight of words using the idea of TextRank.Thirdly,combining the weight and semantic information of words,a new short text similarity calculation algorithm is proposed.In order to improve the accuracy of similarity of short text,we can transformed the problem of similarity calculation between short texts into solving the problem of how to move all the words in a text to another with the shortest distance.Finally,applying the density-based clustering algorithm to cluster the short text.The number of clusters and the center of the clusters are obtained by calculating the local density of each short text and the shortest distance to the short text with higher density,then assigning all short texts to the clusters which they belong.Completing the process of clustering,this method just needs to iterate only once,so the efficiency of clustering improved a lot.A last,calculating weight of each short text according to the weights of the words,sorting the short texts in each cluster,and extracting the most important short texts from each cluster to form the summarization.Using these functions,the summarization obtained must cover all aspects of information,and the diversity and the quality of summarization have been improved.

Keywords/Search Tags:

social network, short text, automatic summarization, Word2Vec model, word weight, short text similarity, density-based clustering algorithm

PDF Full Text Request

Related items

1	Research On Short Text Automatic Summarization Algorithm Based On TextRank And Word2Vec
2	Social Media Short Text Clustering And Its Applications
3	Research On Internet Short Text Message Oriented Multi-Document Automatic Summarization
4	Research On Short Text Clustering Of Social Networks Based On Word2vec
5	Construction And Automatic Filtering Method Of Large Sclae Short Text Summary Data Set
6	Research On Technologies And Methods Of User-oriented Short Text
7	Joint Scoring Automatic Text Summarization Generation Based On TextRank Algorithm
8	Clustering Algorithm Research Of Short Text Based On Semantic Similarity
9	Research On Chinese Short Text Classification Based On Word Embedding
10	Research And Application Of Topic-based Automatic Summarization Of Short Text