Font Size: a A A

Research On Public Sentiment Topic Detection Technology Based On Time Information

Posted on:2014-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:L Q LiuFull Text:PDF
GTID:2268330422450580Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the Internet information continues to grow exponentially, and the Internetbecomes the growing popularity worldwide, the Internet has become the largest andmost promising social media concentration. In such a situation, the information onthe monitoring and analysis of public sentiment is very important. The Internetgenerates a lot of information every day, and how to dig out daily hot topicsefficiently and emergent events become monitoring public sentiment researchpriorities.Chinese time expression recognition is a research hotspot in recent years. In themost researches, using machine learning methods to identify the extent, and usingthe rules methods to do normalization. But the feature selection and rule-making arenot perfect, this paper proposes a feature selection algorithm and develops thenormalization rules manually. The research of hot topic detection considers the timeinformation simply, and for this situation, this paper integrates news textrepresentation and similarity calculation in clustering using time information intotopic detection algorithm. In emergency event recognition, this paper adds timeinformation and considers the time closeness factor. This paper research publicsentiment hot topic detection technology and emergent event recognition based onthe normalized time expressions. Main work is as follows.1. A feature selection algorithm of Chinese time expression is proposed. Thetraditional feature selection algorithms lose the best combination of featuresinevitably, for that defects, this paper proposes a feature selection method based onadding and subbing feature intersection, these features are further filtering usingthis method, and it is verified by the result of exhaustive features experiment. UsingCRF, SVM and maximum entropy model in TempEval-2Chinese corpusrespectively, and comparing and analyzing the experimental results, and analyzingthe possible causes of data that are marked wrong. The F1-score using this methodis highest than other researchers.2. Joining rule method to Chinese time expression type recognition. This paperjoins the rules base on the SVM to recognize Chinese time expression type. Theresult of TempEval-2Chinese corpus proves this method, and precision reaches 96.88%, which is higher than other methods.3. Chinese time expression normalization format and algorithm are proposed. Thispaper uses UTC time as the standard time, and mapping the different sources of newtext according to the uniform time zone. In order to carry out large-scale real-timeanalysis of the data, this paper adds the concept of news fetch time in reference timeconcept. Defining the time extent of some vague time words manually, anddesigning the normalization format of Chinese time that has type DATE and TIME.Base on the above, this paper designs the Chinese time expression normalizationalgorithm.4. Time information is added to the public sentiment topic detection technology.This paper put the time information into the Single-Pass algorithm by the forms ofnews text representation model and similarity calculation model in clustering. Thispaper uses vector space model with tfidf weights to represent news text. To be moreaccurate and comprehensive focused representation of news text, this paperincreases the weights of words that are names, places, organization names, timewords, the words in heading and first paragraph. These time words refer to thewords after normalizing. When calculating the degree of similarity, in addition tothe use of traditional cosine similarity calculation formula, this paper also joins thefactors time distance and defines a function of the distance of time and timegranularity down to minutes, which reflects a more detailed time cluster center thanother methods researchers. With the above improvements, the paper conductsexperiments in a manually annotated corpus on10categories of topics from realnetwork environment to verify the time information added to the public sentimenthot topic is effective.5. An algorithm of emergent event recognition based on time closeness isproposed. To be more accurate and timely alarming of emergent event, an algorithmof emergent event recognition based on time closeness is proposed, and an systemof emergent event recognition is designed and implemented.
Keywords/Search Tags:public sentiment monitoring, Chinese time expression recognition, topicdetection, emergent event recognition, public sentiment alarming
PDF Full Text Request
Related items