Topic Detection And Trend Prediction Of Web Text

Posted on:2014-06-29

Degree:Master

Type:Thesis

Country:China

Candidate:L T Wang

Full Text:PDF

GTID:2268330392473590

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the development of Web2.0technology, Internet users are no longer just theinformation seekerï¼Œand become the information creator. Meanwhileï¼Œas the emergenceof SNS (Social Network Service), more and more people become the role ofinformation creator, and develop into self-media group. Because of the characteristicof social mediaï¼Œi.e., convenient, real-time, the users-written texts are mainly shorttexts and its number reaches more than billions. As the burst growth of information,the needs of users are not for seeking mass information, and become to seekingintegrated information.In response to this demand, this paper studies and realizes a web text miningsystem. First, after preprocessing the web text, texts that have the similar topic areaggregated into one topic cluster. Thenï¼Œbased on the proposed model of topic miningï¼Œwe achieve the purpose of topic detecting and trend predicting. Atfer the textclustering analysis and topic miningï¼Œdisordered texts are integrated to topicdescriptions. In this paper, main works and innovations are as follows:Firstï¼Œa short text semantic distance calculation method is studied andimplemented. The method takes into account the impact of short text words and wordsstructure of the semantic representation of the textï¼Œand regards the semantic distanceas a comprehensive distance of words distance and structural distance. Whencalculating the structural distance, we found the maximum matching between textsbased on HIT-CIR Tongyici Cilin (Extended edition) was the best representationindex of how extent the sequence arrangement was. Then we compared the wordsmeaning between texts using a words similarityï¼Œwhich was a kind of improved editdistance that gave a word a unique weight according its types. Finallyï¼Œwe calculatethe semantic distance between texts as a balance of structural distance and wordsdistance. Experimental results show our methods are efficient better than those twoclassical distance calculating model.Secondly, this paper presents a short text from the penalty algorithm based on thelength of the text content words. In order to eliminate the influence of sentencelengthsï¼Œwe use distinct words length to adjust above semantic distance. By usingHeapâ€™s law and Zipfâ€™s law, a distinct words length estimated method was presented.Finallyï¼Œa topic mining model is studied and implementedï¼Œand topic mining includes the topic detecting and trend predicting. By analyzing texts in topic cluster,the topic detecting extracts keyword descriptions of the topic cluster. Trend predictingis based on the topic description, and analyzes the trend of the topic to predict thedevelopment of the topic. There is a direct relationship between topic propagation andusersâ€™ concern, while the existence of active users has an important influence on thepropagation of topic. This paper analyzes the userâ€™s concern model, and finds that theuserâ€™s concern model can predict the trend of the topic.In addition, based on the Tweets corpus of Twitter retrieval task in TREC2011ï¼Œwe establish a tweet information database, we save the original field information ofthe corpus, and then according to the hashtags in the tweet, we classify the corpus intodifferent classes, and the classes can be used for short text classification. Finally,according to the flow of information between the users, we establish the informationlfow network. By using the database, researchers in related fields can carry out theresearches efficiently.

Keywords/Search Tags:

short text processing, semantic cluster, hot topic mining, trends prediction

PDF Full Text Request

Related items

1	Research And Application Of Topic Model For Short Texts Based On Part-of-Speech Feature And Semantic Enhancement
2	The Research On Short Text Semantic Mining Based On Topic Model And Word Vector
3	Research On Short Text Topic Information Mining Technology
4	Research On Topic Evolution Of Short Text Based On Self-Aggregation Strategy
5	Mongolian Short Text Semantic Similarity Calculation Based On Deep VAE Integrated With Topic Information
6	Short Text Topic Modeling Research Based On The Semantic Extension Of Knowledge Graph
7	Research On Key Technologies Of Short Text Hot Topic Detection
8	Construction Of Hierarchical Semantic Graph And Its Application In Text Mining
9	Event Classification And Tracking Topic Trends Based On Sequential Short Text
10	Short Text Topic Mining Based On W-BTM And Text Classification Application