Font Size: a A A

Research On The Construction Method Of Technology Domain Thematic Library Based On Multilevel Topic Vector

Posted on:2020-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:X S ChenFull Text:PDF
GTID:2428330605966662Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the exponential explosive growth of achievements in the field of science and technology,in order to obtain the relative scientific and technological data efficiently and accurately,both academia and industry urgently need accurate professional databases and knowledge search platforms.Although there are many large service platforms for scientific and technological works,it is still difficult to meet the needs to offer service on precision search in different fields.The emergence of some thematic databases can better meet this specific and accurate search,but they have the problem of long investment cycle and so on.Therefore,on the basis of science and technology big data,there is an urgent need for an intelligent construction method of science and technology thematic database with high efficiency and low cost.This paper mainly studies the multi-level subject vector space construction method,based on the classification algorithm of the subject vector space,and on the basis of this,further research of the construction method and application of the thematic library in the field of science and technology.The specific contents are as follows:1.To construct multi-level theme vector space in different fields in an intelligent way,a construction method using unsupervised clustering algorithm and text vector representation model is proposed to solve the problem of insufficient versatility on constructing different fields in theme vector space construction.The model firstly establishes a global word co-occurrence matrix for the unlabeled text datasets in the domain,and then obtains the multi-level topic structure through the improved method of the Possion process infinite relational model.Finally,the Glo Ve model is used to implement the multi-level theme vector space construction.2.Aiming at the text classification problem in multi-level topic vector space,a new text similarity comparison method is proposed to solve the disadvantage of the high time complexity of the text classification method with words as features.The algorithm uses the topic text representation method based on the multi-level topic vector space,and defines the text similarity as the minimum cost of matching all topics in the text to all the subject-terms in another text.The semantic relevance is defined by the Euclidean distance of two topics in the subject vector space.Finally,the text classification in the domain is implemented by text training dataset.3.Based on the above research results,the science and technology domain thematic database construction system based on multi-level theme vector space is implemented by the science and technology achievement text corpus.This system builds a domain-specific thematic database based on the user's submitted requirements and obtains the corresponding science and technology achievements.Therefore,the problem of long development cycle and difficult maintenance in the process of artificially constructing the thematic database based on the expert system is solved.
Keywords/Search Tags:Natural language processing, Unsupervised clustering, Poisson process infinite relational model, Topic vector space, Text classification
PDF Full Text Request
Related items