Topic Discovery Research Oriented To News Text

Posted on:2019-05-02

Degree:Master

Type:Thesis

Country:China

Candidate:T T Wang

Full Text:PDF

GTID:2405330551458541

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

With the development and progress of science and technology,human beings have entered the era of big data.As a result,there produce a large number of network information from internet,and the information is out of order.Besides,how to find the content that users are interested in from the network information is a heated and difficult topic in the field of text mining.In recent years,the researches of topic discovery almost are based on Vector Space Model(VSM)and LDA(Latent Dirichlet Allocation)model.However,how to improve the quality of clustering is always a basic problem in the field of topic discovery,that is from the news reports.So in the paper,there are three different models,including Vector Space Model,the binary Co-occurrence Latent Semantic Vector Space Model(CLSVSM)and LDA theme model,applied in the field of topic discovery to do research and analysis.First of all,there are some shortcomings for Vector Space Model.So Vector Space Model is constructed based on part of speech extraction in the paper.Then,based on the TF-IDF weight method,we use the K-means method and the agglomerative hierarchical clustering method to analyze and compare the result of clustering.The second,we know that comparing with the Vector Space Model,the Co-occurrence Latent Semantic Vector Space Model can greatly improve the accuracy of text clustering.So the paper applies the binary Cooccurrence Latent Semantic Vector Space Model to the field of topic discovery.In addition,the paper compares the Co-occurrence Latent Semantic Vector Space Model with the other two models from aspects of clustering effect and topic recognition.The last,we select the text,that is a part of Sogou news corpus,to do the experiments.And,the paper utilizes the result of F-measure to evaluate the results of clustering.Some conclusions are obtained through the experiment.In the Vector Space Model,the clustering results of obtained by the method of part of speech extraction are more accurate.But the clustering result of the method is not as good as LDA theme model and the binary Co-occurrence Latent Semantic Vector Space Model.In addition,there is no significant difference in clustering quality between LDA model and the binary Co-occurrence Latent Semantic Vector Space Model.Also,the results verify the effectiveness of the method of constructing Vector Space Model,that is combined with part of speech.In addition,the results show that the method of applying the binary Co-occurrence Latent Semantic Vector Space Model to carry out the research of topic discovery is reasonable and effective.Furthermore,we combine the characteristics of the three models to extract different topic words from each category.Of course,the methods of extracting topic words for every model are different.In addition,according to these subject words,that are extracted from every category,we can easily understand the main contents of the news text.And,we can clearly find the main topics contained in the news contents.

Keywords/Search Tags:

Topic discovery, Text clustering, LDA theme model, Co-occurrence Latent Semantic Vector Space Model, Vector Space Model

PDF Full Text Request

Related items

1	A Study Of Ancient Chinese Vocabulary Based On Vector Space Model
2	Cambodian Named Entity Recognition Based On The Topic Model Word Vector
3	Research And Implement Of Classical Poetry Artistic Conception Classification
4	Research And Application Of Movie Recommendation System Based On User Clustering And Time Based Latent Factor Model
5	Research On Deep Learning Automatic Composition Based On MIDI Music
6	An Applied Research Of Rule Space Model In Identification Of Statistic Study Pattern
7	A Study On The Styles Of Ge Fei And Yu Hua's Novels Based On Text Mining
8	Research On Generalized Model Of Chinese Couplet Based On Recurrent Neural Networks
9	A computational model of word and sentence meaning
10	Toward A Cognitive Model Of Deixis