Font Size: a A A

Research On The Relevance Identification Of Chinese News Subject Events

Posted on:2017-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:P P LiuFull Text:PDF
GTID:2278330488965649Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the vigorous development of the Internet, the network has become the main carrier and media of news events. Every day, there is a large number of news updating in website. It is difficult for users to get useful information. By identifying the relationship between the subject events of the news reports, users can obtain the key elements of a news event and the related events in a short period of time, and it would reduce quantities of the users’ news reading. Therefore, it has important significance to study the relationship between events. The research on event relation has always been a hot and difficult topic in the field of information extraction. At present, the research on event relation is mainly aimed at the deep level event relation detection. It is to identify and extract specific type of event relationships including temporal and causal relation. The research on the shallow level event relation detection which is the event correlation recognition has been poorly studied. Although there are a few scholars doing the news event correlation recognition research, due to the complexity of Chinese news event and only considering a single event correlation factor, the event correlation recognition accuracy is low. In order to overcome the one-sidedness of the single influencing factors, this paper puts forward a number of factors that affect the correlation based on the characteristics of Chinese news events, and uses the method of multi-factor analysis to realize the recognition of event correlation. Specifically, the main work carried out in the following aspects.(1) The extraction of web news text based on text features and tag attributesAt present, the mainstream media will publish news events on the Internet, and the network source is very rich, so how to extract the network news has become a research focus. In view of the news title, the simple rule matching method is used; in view of the news content, through the analysis of news features, a method based on text features and tag attributes is proposed. Firstly, the HTML document is preprocessed, including filtering the useless tags and repair HTML syntax error, then the DOM tree is generated, Finally, leaf node of DOM tree are filtered according to the characteristic of news text and the label attribute. Through the above steps, the news text is got.(2) The extraction of topic events of news report based on paragraph orientationA news report is consists of a number of atomic events, and the entire atomic events center on the topic events of news, so it needs to extract the topic event of news report. In the extraction process, first of all, news text is preprocessed, including clause, word segmentation, POS tagging and entity recognition. Then all the news events in the news report are identified, non-event Sentences are filtered out. Finally the topic event of news report is got. According to the characteristics of Chinese news event, this paper proposes an event sentence recognition approach based on semantic dependency relationship analysis. Based on the recognition of event sentence, the extraction method using paragraph location is used to recognize the topic events of news report.(3) Recognition of Chinese news event correlation based on grey relational analysisFirstly, through analyzing the characteristics of Chinese news events, three factors affecting event correlation are proposed, which are the co-occurrence of triggers, shared nouns between events and the semantic similarity of the event sentence; Secondly, quantification processing of the three factors was made and the influence weights of three factors were got; Finally, grey relational analysis was used to combine the three factors, the grey relational analysis model between events was established and the event correlation recognition was realized.
Keywords/Search Tags:tag attributes, paragraph orientation, topic events of news report, grey relational analysis
PDF Full Text Request
Related items