Font Size: a A A

The Application Of Improved Labeled LDA Model In The Classification Of Technical Video Text

Posted on:2019-08-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y X FanFull Text:PDF
GTID:2428330623469008Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Video in the field of science and technology is one of the important carriers for the creation and dissemination of innovative technology and popular science knowledge,but most of the video data need to be processed to form the knowledge that can be spread.The automatic annotation and classification of video texts in the field of science and technology plays an important role in the latest technology of retrieval technology,the new trend of communication technology and the popularization of popular science knowledge.Massive technology video text categorization has become the focus of Natural Language Processing research.The current video text classification methods are mainly based on vector space model,based on keyword extraction and tag based three.Because the length of the video text is limited,the vector space model is easy to cause high sparsity,which affects the classification effect.The complex semantic of video text will reduce the quality of keyword extraction;the label is a high induction of video content,which is widely used in classification.The topic model can effectively extract text hidden topics,but the application of video texts in the field of science and technology is not ideal,and there is much room for improvement.Therefore,this paper combines category labels with video texts,and improves thematic models according to the characteristics of video texts in the field of science and technology.The main content of this paper is based on the construction of domain term library and tag LDA model.(1)Domain terminology,as the basic element of video text in science and technology,is easily ignored in the classification process because of its low frequency.However,domain terms play an important role in projecting video topics.This paper proposes the construction of the domain terminology library.The construction of the term base is divided into two parts: first,the basic terminology library is established by analyzing patents,consultants and crawler techniques;second,training conditions follow the model of the airport to identify new domain terms and join the terminology library after being audited.(2)Text preprocessing is an important preparation for classification,but the traditional methods will cause the segmentation of domain terms and destroy the semantics of video text.Therefore,the term library should be used in the segmentation stage,and the preprocessing method suitable for video text in the field of science and technology is proposed.(3)The traditional Labeled LDA model is biased towards high frequency words,and it can not process domain terms.Aiming at this,the Labeled LDA model and the classification process are improved by combining the chi square statistics,the text location weighting algorithm and the domain terminology library to improve the quality of the subject words.In the training phase,the domain terms are processed,and the contribution degree to the subject is divided into two levels;at the classification stage,the label is mapped according to the level of the domain terminology in the text to be classified.In this paper,the domain terminology identification and improved topic model are tested.The results show that the improved Labeled LDA model proposed in this paper has improved the classification accuracy in most categories compared with the traditional model.
Keywords/Search Tags:Science and Technology video text classification, Labels, Domain word extraction, Condition Random Field, Labeled LDA, CHI Square Weighted
PDF Full Text Request
Related items