Core Entity Recognition For Web Articles Based On Tree-LSTM Model

Posted on:2022-10-11

Degree:Master

Type:Thesis

Country:China

Candidate:K Zhou

Full Text:PDF

GTID:2518306566497644

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of mobile Internet and online social media,the explosive growth of online text information has made the issue of "information overload" increasingly serious.A large amount of information on the Internet is difficult to distinguish between true and false,which also increases the cost of obtaining the effective information.The core entity is the main description object of an article,or the main role of article entities.Identifying the core entities in online articles will help people quickly grasp the main content of articles from a large amount of text information and obtain effective information in time.Because the online articles involve many areas and,various text structures,and have inconsistent distribution of core entities and inconsistent statistical features,it is impossible to clearly describe the semantic features of core entities.In addition,the boundaries of the core entity words in article is difficult to mark,and long entity words have the phenomenon of word combination and word nesting,which makes the extraction of core entityies more difficult.Furthermore,identifying the core entity actually needs to grasp the main description object from article based on paragraph or whole text level comprehension.According to practical requirements,the thesis has carried out the following researches around the effect of long-distance text information on core entity recognition of online articles.(1)BiLSTM-CRF model is widely used in natural language processing task due to the fact that it can capture long-distance dependence.In practice,owing to the Chinese word-segmentation issues,the BiLSTMCRF model usually uses character embedding instead of word level.Character embedding is not an ideal choice for capturing exact semantic expressions.Therefore,according to the characteristics of tasks and thanks to the core entity's word combination and nesting phenomenon,the thesis proposes a word-level BiLSTM-CRF method for article core entity recognition.(2)To solve the problem that BiLSTM-CRF is difficult to capture the long-distance text information in a complex semantic environment,the thesis proposes a method of identifying the core entities of articles based on a Tree-LSTM-CRF model.Based on the syntactic dependency and hierarchical structure of the article,the model constructs a tree-like text understanding dependency structure.By making use of the bottom-up information transmission mode and information memory ability of Tree-LSTM,the model can well capture the long-distance text information in the article,thus improving the effect of identifying core entities.Experiment results show that the F1 value has increased by 11.57%comparing with the BiLSTMCRF model.(3)Aiming at the defect of imperfect interaction between words and text information in Tree-LSTMCRF model,the thesis further proposes an Attention-Based Tree-LSTM-CRF model.The hierarchical attention based on text information is introduced into the Tree-LSTM-CRF model.Through the information interaction between words and sentences,paragraphs and articles,the feature of the importance of words to sentences,paragraphs and articles is successfully captured,which increases the interaction between sentences and text information and improves the ability of the model to identify core entities.Experiment results show that the F1 value of this improved model has improved by 24.58%compared with BiLSTM-CRF and 11.66%compared with Tree-LSTM-CRF.The performance has been further improved.

Keywords/Search Tags:

Core entity recognition, Tree-LSTM-CRF model, Attention mechanism

PDF Full Text Request

Related items

1	Joint Extraction Of Named Entity Recognition And Entity Relationship Based On Neural Network
2	Research On Named Entity Recognition Based On Deep Learning
3	The Research And Implementation Of Named Entity Recognition For Chinese Social Media
4	Research On Entity Relation Extract Based On LSTM
5	Research And Implementation Of Chinese Named Entity Recognition Based On Lattice-LSTM Model
6	Entity Relationship Extraction Based On Bi-LSTM And Attention Mechanism
7	Named Entity Recognition Based On LSTM With Hierarchical Residual Connection
8	Research On Chinese Named Entity Recognition Algorithm Based On Remote Semantics And LSTM-CRF
9	Research On Named Entity Recognition With Deep Learning
10	Research On Named Entity Recognition Combining Residual Structure And Attention Mechanism