Font Size: a A A

Research And Analysis Of Event Detection And Evaluation System Based On Web

Posted on:2017-04-29Degree:MasterType:Thesis
Country:ChinaCandidate:B H YuFull Text:PDF
GTID:2348330503495777Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet, the popularity of social networks,there are a large amount of user data in the network.However,the data are generally in the form of semi-structred,there are a large amount of data producted by News websits and social networks in every day. Extracting valid data from the network for detecting of the event and evaluating the attitude of users in the event becomes a hot problem to research.Our mainly focuses on the Chinese websites and micro blog as the research object,having achieved extract valid data quickly.Then using event detection algorithm for new event detection.Analysis of the users' attitude towards the current topic of micro blog.Details are as follows:(1) In order to extract the valid data from the massive webpage that are semi-structured quickly,we propse a algorithm based on Game Theory to extract valid data.First of all dividing the tag into blocks that form the stragety of player one,the strategy of player two is whether extracting or not.According to the experiment we find our method is superior to the DOM tree analysis and visual segmentation algorithm especially in efficiency.So the method can be used for the applications such as screen reading.(2) We propose a method to extract keywords from the text as feature vector based on Text Rank algorithm.First of all,split the text into words,after the segmentation,60 feature vectors are extracted by Text Rank algorithm,then single-pass clustering is perfromed to detect the new events.According to the experiment we find our method is superior to the TF-IDF method,the results show our method to calculate the weight of words is more reasonable.(3) We propose a method to extract keywords from the text as candidate words based on Text Rank algorithm.First of all, keywords are extracted by Text Rank algorithm then extract the evaluation objects and words.Finally according to the emotional dictionary to calculate mutual information that can determine the polarity of emotion.According to the experiment we find that the accuracy of our method is a little lower than the maximum entropy syntactic parsing method,but the time efficiency is higher than it. This algorithm has a great advantage in dealing with massive data.
Keywords/Search Tags:Webpage parse, Text Extraction, Game Theory, Event Detection, Event Evaluation
PDF Full Text Request
Related items