Font Size: a A A

Research And Application Of Text Feature Extraction Method Based On Word Co-occurrence Network

Posted on:2019-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:P L GuoFull Text:PDF
GTID:2438330572951127Subject:Engineering
Abstract/Summary:PDF Full Text Request
Text feature extraction is an important topic in Natural Language Processing research in order to extract language components that can express text features from texts.With the growth of Internet data,more and more data need to be handled and stored.Massive text data not only need a lot of hardware's support,but also require higher performance and accuracy of text processing.The efficient extraction of valuable content from the text can not only reduce the size of the storaged data,but also extract more valuable information,which provides a data base for the follow-up work for classification and clustering in Natural Language Processing.The text of different vertical fields has different attributes and characteristics.It is not only the focus of the Natural Language Processing scientific research,but also one of the important needs of the artificial intelligence social production,how to excavate the attributes and characteristics related to the text processing task more quickly and effectively.Based on the existing research results,this paper focuses on a large amount of research work on the performance and efficiency of text feature extraction.In the context of Internet public opinion and e-commerce,this paper analyzes and compares the pros and cons of three text feature extraction methods based on statistics,graph-based models,and linguistics-based methods,and innovatively combines graph-based and statistics-based features.With the combination of extraction methods,a text feature word extraction method based on word co-occurrence network and chi-square statistics was proposed.This method comprehensively considers the article keywords and keywords,which not only exerts the advantages of rich semantic extraction and high accuracy of the graph model recognition,but also retains the features of good performance and high speed of the statistical model.In the analysis of model performance and efficiency,this paper evaluates the advantages and disadvantages of the model from three aspects:data size,time complexity,and application type.The experimental results show that compared with the traditional tf-idf,textRank and other classic feature extraction algorithms,the method improves the accuracy by 5%-10%and the performance by 50 percentage points.The efficiency of this model does not change with the data size and text processing tasks,so the model has strong task migration and generalization capabilities.In this paper,the algorithm is implemented through the flask framework.The client sends the http request to the server to realize the real-time service of the model,unifies the application interface,and supports the distributed processing of multiple data.
Keywords/Search Tags:co-occurrence network, feature extraction, keyword, abstract
PDF Full Text Request
Related items