Font Size: a A A

Study On Session-Based Filtering Of Massive Short Messages

Posted on:2012-05-04Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhangFull Text:PDF
GTID:2218330338453815Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the short message more and more widely used, people pay more and more attention to the corresponding filtering technology. Short message filtering require to quickly identify the purpose message from massive message set, and use these short messages for the follow-up processing. At present most short message filtering methods were used for garbage short message removal, the main purpose is prevent the illegally and meaningless short message. Improving the efficiency and accuracy of filtering technique is always the objectives of the study. The current research on short information filtering focuses on the accuracy of filter, less study on the improveing of filter efficiency.This paper has a deep research for short information filtering process and the characteristics of short messages data, based on these work we do the following job:The current filtration method, mainly use keyword matching as the foundation, and deal with the messages one by one. To improve the efficiency of filtering, we analyze the relation between different messages, and put forward the thought that division the message sets into different sessions, and extract feature vectors of these sessions. This paper gives the method and basis of the session-division and the feature extraction process.In order to be able to fast filtration, we use the keywords of the session`s feature vector to buid an index and use this index for filtering. The index`s struction contain two son index: the first index was conposed by keyword and senssion id, it is an iverted index; the second index was coposed by sension and the message id, it is an normal index. This paper gives the structure of the index and the constructing algorithm.To solve the updating problem in the template, this paper puts forward user template updating maintenance methods, new features of the discovery and old features out method based on the feedback of the filtering results. Given the evaluation method for feature words and the structure and algorithm for feature words maintaining.Finally, we give a experiment to test the method in this paper, validate the short based on word index information filtering method and user template updating method performance. The method proposed in this paper has effectively improved the response speed of the filter, it can also updating the keyword in user template vector.
Keywords/Search Tags:short message filtering, session-division, feature extract, fast-filtering model, feedback study
PDF Full Text Request
Related items