Research And Application Of Text Feature Extraction Method Based On Word Co-occurrence Network

Posted on:2019-05-01

Degree:Master

Type:Thesis

Country:China

Candidate:P L Guo

Full Text:PDF

GTID:2438330572951127

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Text feature extraction is an important topic in Natural Language Processing research in order to extract language components that can express text features from texts.With the growth of Internet data,more and more data need to be handled and stored.Massive text data not only need a lot of hardware's support,but also require higher performance and accuracy of text processing.The efficient extraction of valuable content from the text can not only reduce the size of the storaged data,but also extract more valuable information,which provides a data base for the follow-up work for classification and clustering in Natural Language Processing.The text of different vertical fields has different attributes and characteristics.It is not only the focus of the Natural Language Processing scientific research,but also one of the important needs of the artificial intelligence social production,how to excavate the attributes and characteristics related to the text processing task more quickly and effectively.Based on the existing research results,this paper focuses on a large amount of research work on the performance and efficiency of text feature extraction.In the context of Internet public opinion and e-commerce,this paper analyzes and compares the pros and cons of three text feature extraction methods based on statistics,graph-based models,and linguistics-based methods,and innovatively combines graph-based and statistics-based features.With the combination of extraction methods,a text feature word extraction method based on word co-occurrence network and chi-square statistics was proposed.This method comprehensively considers the article keywords and keywords,which not only exerts the advantages of rich semantic extraction and high accuracy of the graph model recognition,but also retains the features of good performance and high speed of the statistical model.In the analysis of model performance and efficiency,this paper evaluates the advantages and disadvantages of the model from three aspects:data size,time complexity,and application type.The experimental results show that compared with the traditional tf-idf,textRank and other classic feature extraction algorithms,the method improves the accuracy by 5%-10%and the performance by 50 percentage points.The efficiency of this model does not change with the data size and text processing tasks,so the model has strong task migration and generalization capabilities.In this paper,the algorithm is implemented through the flask framework.The client sends the http request to the server to realize the real-time service of the model,unifies the application interface,and supports the distributed processing of multiple data.

Keywords/Search Tags:

co-occurrence network, feature extraction, keyword, abstract

PDF Full Text Request

Related items

1	Automatic Abstract Extraction Based On Keyword And Graph Model
2	Keyword Extraction From News Web Pages
3	Research About Term Network Based Keywords Extraction Strategy
4	Research On Keyword Extraction And Improved LSA Based On Co-occurrence Word
5	The Research And Implementation Of Keyword Extraction
6	Research And Application Of Microblog Events Abstract Generation And Evolution Analysis Technology
7	Research On Multi Feature Based Extract Text Keyword Algorithm
8	The Research On Keywords Extraction From Chinese News Web Pages Based On Clustering
9	Skin Classification And Feature Extraction Based On Neural Network Analysis Tool
10	Research On Keyword Extraction From Chinese News Web Pages Based On Compose Features