Font Size: a A A

Research And Implementation Of Chinese News Eiements Recognition System

Posted on:2019-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:X ShiFull Text:PDF
GTID:2348330542498622Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology and the increasing bandwidth and speed of communication networks,there has been a surge in the number of news items and the diversification of news types.At the same time,this has also affected the quality of news content to a certain extent,It takes more energy and time to care about information than ever before.In the age of the Internet,when people face large amounts of news,they tend to be interested in only the four main elements of the news(what,when,who,where,which are called elements of 4w).With 4w elements,people can understand the outline of the news events,which can effectively select the news you are interested in,and then interested in their own content for detailed reading.How to extract the 4w elements of the news center that users are interested in from the whole news and present it in a kind of friendly interface so that users can find the news they are interested in in less time is a matter for journalists and software developers in front of urgent problems.The current development of machine learning and natural language processing technologies has made it possible for software practitioners to use software-related technologies to automatically extract key content of interest from a large amount of news.The content of this paper is based on the above purpose,through a large number of online news research,research summarized the characteristics of the network news,and these characteristics and machine learning technology and computer natural language processing technology,through the news sentence segmentation,entity extraction,and the extraction of key sentences of news events,find the key sentence of news events.Based on this,we will find out what other topics are related to news main events such as people,time,place,and so on,so as to describe the news content in a synoptic way,In the case of reading the news,the main information of the news description can be more directly understood without having to blindly read every piece of news blindly and waste a lot of time as before.Focusing on the 4w factor extraction task of news,this dissertation studies the two key technologies of event sentence extraction and event factor extraction based on the relevant technologies in the area of natural language processing and driven by the events(what),and completes the following characteristics research work:Firstly,we propose a key event sentence extraction algorithm that combines multi-dimension features such as sentence and title relevance,stop frequency,entity and sentence length.In the process of calculating whether news headlines have referential properties,Word2vec word vector algorithm to carry out the title word expansion;Secondly,combined with semantic role labeling and dependency parsing techniques,this paper studies and summarizes other news related elements extraction methods associated with event sentences,draws rules for extracting news elements from key event sentences,and implements the key elements based on key events Sentence extraction model.Finally,relying on the above research results,this paper encodes a prototype system of Chinese news element recognition based on B/S structure,visualizes the news key words,key event sentences and event elements in the system,and provides users with the information more intuitive extraction results show.The system has been tested and deployed to software users.After several months of operation,the system is stable and reliable with good results and has been well received by project partners.
Keywords/Search Tags:natural language processing, information extraction, key event extraction, event elements extraction
PDF Full Text Request
Related items