Font Size: a A A

Research On The Variant Short Texts Filtering Algorithm

Posted on:2014-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y X WenFull Text:PDF
GTID:2248330398970644Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile phones and the Internet and the rise of social-networking, such as mobile phone text messages and microblogs and other types of short text occupy people’s daily lives. At the same time, a lot of garbage, harmful short text can also interfere with normal users and even result in losses to the users. The garbage, harmful short text we need to take the text processing technology to filter. However, in order to circumvent conventional filtering methods, those harmful short texts often appear in irregular and not normal variant forms (variant features). We call those short texts as variant short texts. There is a lack of efficient solutions to this problem. And the text filtering methods in dealing with such variant short texts filtering increase the negative impact of a large number of human intervention works.In this paper, the works on the variant short texts were:First, study and research the existing text filtering algorithms. Study the characteristics and difficulty of the problem, and analyze the strengths and weaknesses of the existing filtering algorithms.Second, through a detailed analysis of the characteristics of the existing variants short text, this paper propose level feature concepts and a short text filtering algorithm based on the variant level features.Third, the algorithm key algorithm and key technologies. Filtering algorithm based on the level variant short text and training methods, by automatically learning to identify the different level features; given based on the ROC curve to identify the best way to determine threshold; others, such as text preprocessing critical approach.Fourth, the design and implementation of the variant short text filtering system based on level features, experimental results verify the effectiveness of the algorithm.
Keywords/Search Tags:Features of Keywords, Text Filtering, Variant ShortTexts, Chinese Texts
PDF Full Text Request
Related items