Font Size: a A A

Research On Key Technologies Of Text Segmentation

Posted on:2020-11-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y P DengFull Text:PDF
GTID:2428330590473216Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Text segmentation,which divides the long paragraphs of text into several independent sub-theme modules.There is a high degree of information cohesion in the module and a small degree of information coupling between modules.Traditional text processing techniques,which use sentences or texts as basic semantic processing units,are less informative and easy to lose context;the latter is informative,limited by model and computing resources,it is difficult to capture detailed information.Subtheme module compromises these two ways,so that the model can make full use of the valuable contextual information without losing the details,which can greatly improve the effect and efficiency of text handling.Text segmentation task was proposed earl ier.Because of the lack of annotated data,unsupervised methods are often used to achieve text segmentation in the industry.The traditional method is simple,model parameters of which are too dependent on artificial settings.so the final results are not accurate enough.In this paper,we propose an improved scheme for the traditional segmentation model,and a new idea of using supervised learning to deal with text segmentation tasks.Compared with the traditional segmentation method,this method has bee n greatly improved.The key work of this paper mainly includes the three points followed:1.Constructing text segmentation corpus: Obtain a large number of Web Texts from People's Daily and Sina columns,and get the number of text segmentation annotations through a little of manual annotations,revisions and screening,in order to providing data support for the follow-up supervised learning methods.2.Implementing text segmentation method based on machine learning: The processing idea and related technology of traditional text segmentation method are given.On this basis,the fusion rule template is improved,which greatly improves the accuracy of model discrimination module boundary.3.A text segmentation method based on in-depth learning is proposed.The idea of using supervised learning to solve the task of text segmentation is proposed.The text segmentation model contains two ways: classification and sequence labeling.Classification-based text segmentation method divides text by judging whether the current location is a segmentation point or not;sequence-based tagging method takes sentences as the basic processing unit,and then tags sentence sequences,and divides topic modules with sentence tags.Both of them have achieved good results in the test set.In a ddition,considering that the number of tagged data is small,we also apply transfer learning to initialize the underlying network parameters in the sequence tagging model,which further improves the generalization of the segmentation model.
Keywords/Search Tags:Text Segmentation, Supervised Learning, Classification, Sequence Labeling, Deep Learning, Transfer Learning
PDF Full Text Request
Related items