Font Size: a A A

Research And Application Of Person Figure Mining Based On Text Analysis

Posted on:2017-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z J LiFull Text:PDF
GTID:2308330485985006Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The rapid development of network technology and popularity of the Internet make people increasingly depend on the exchange and sharing of network information. The need of getting person information from the Internet is improving gradually. However, massive network information data make the person information as messy and fragmented. The existing work of portraits for people mainly focused on person attributes extraction. Users are eager to obtain all aspects of person information easily and quickly.This thesis researches on the person figure mining technology by taking the news as processing target, focusing on three aspects: person social relations extraction, person participating event tracking, person hotness and sentiment analysis. In the aspect of person’s relations extraction, this thesis firstly establishes the characters relationship thesaurus by the method of synonyms Cilin extending seeds dictionary, avoiding low efficiency caused by collecting dictionary manually. Secondly, it propses an improved algorithm to extract character relations which based on combining the rule matching and syntax tree together, effectively overcome the malpractice of low recall rate causing by rule matching. The average F-Score of this algorithm reaches 82.61% in the experiment which got an obvious advantage compared with other methods.In the aspect of person participating event tracking, the thesis mainly improves following three aspects: text feature extraction, feature dimension reduction and text similarity calculation. Thesis puts forward that using triple vectors which include title, person name and text content to express the text, and regards the value of weighted summation of triple vector as the text similarity. It ensures event’s time property owing to introducing time attenuation factor when the text clustering. In terms of person hotness and sentiment analysis, the thesis firstly analyses the various factors that influencing the person hotness and provides the specific calculation formula about the heat value. Secondly, thesis takes the method based on lexicon to analyze of emotion tendency and obtains better results in the medium-size test corpus.Through the above three aspects, this thesis effectively organizes the person information that scattered throughout the network to create a person figure. The research results can be applied to people search system, specific target tracking and web well-known person detection and so on, which provide great convenience for people’s work and life. The further work of the thesis mainly includes two parts: reducing the time complexity of relationship extraction algorithm and introducing semantic analysis for the deeper study on the emotional tendencies.
Keywords/Search Tags:Relation Extraction, Dependency Syntactic Parsing, Text Feature Extraction, Sentiment Analysis
PDF Full Text Request
Related items