Font Size: a A A

Research And Implementation Of Web Text Summarization System Based On LDA Topic Model

Posted on:2018-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:G C SunFull Text:PDF
GTID:2348330518498510Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The arrival of the Internet era has led to the explosive growth of web information. The demand for quickobtain the web text is very urgent.Therefore, the automatic abstracting algorithm and automatic abstrac software of automatic extraction of text abstracts have become a hot topic in the field of information processing.Through the analysis and summary the problem of existing automatic algorithm and software, this dissertation proposes a new similarity algorithm and LDA-TDTI sentence sorting algorithm based on the LDA theme model, in order to deal with the main problems of existing similarity algorithm and sentence sorting algorithm. And combined with the import and export textile raw materials quality and safety risk monitoring system exploited a set of automatic digest system.First, the existing similarity calculation methods have been summarized. After comparing the advantages and disadvantages of various similarity algorithms, a new similarity calculation method is proposed; The algorithm builds on the theory of potential Dirichlet distribution (LDA), and constructs the theme space model, in which word, words, sentences, documents, corpus are expressed as vector forms; Experimental results show that the algorithm achieves Dimension reduction, thus avoids the use of external dictionaries, eliminates the semantics of unregistered words and other issues.Second,In this dissertation, an LDA-TDTI sentence sorting algorithm based on the LDA theme model is proposed for the problem that the distribution of the document subject exists even not being related in the LDA model. The algorithm regards the similarity between subject distribution and subject importance as a standard of calculating the importance of sentences. The higher the similarity of the sentence, the more the sentence can represent the subject of the article .After theoretical analysis and experimental verification, it is proved that the algorithm does improve the coverage of the abstract and improve the quality of the abstract.Third, based on the research results of sentence sorting algorithm,OO and UML are used to analyze and design a set of automatic digest system. The main software system model of the system use case diagram,E-R diagram, architecture design, database physical structure, class diagram, application interface and interaction diagram is given, and the realization scheme of the main module is expounded. The system can quickly extract the abstract of the web text, and the abstract quality is higher.Finally, automatic abstract system will be applied to the import and export textile raw materials quality and safety risk monitoring system.The results show that the system can automatically extract the web text digest, the user experience is better, compared with the same type of system, the quality of the extracted extract has been significantly improved.
Keywords/Search Tags:LDA theme model, sentence sorting, similarity, automatic abstract
PDF Full Text Request
Related items