Font Size: a A A

The Key Technology Research On Internet Short-Text Information Classification

Posted on:2010-11-05Degree:MasterType:Thesis
Country:ChinaCandidate:C M ChaiFull Text:PDF
GTID:2178360275970357Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
This paper makes relatively deep discussion in the field of Internet short-text information classification. By combining the text classification based on the improved KNN algorithm, making modification to traditional text classification, the accurate and universality of the internet short-text information classification has been improved.At the first part, the feature of short-text information is discussed. Compared with ordinary text, it has some distinct features. Short-text which exists includes information which appears frequently on BBS/Blog in internet, that is, the dapper main note and reply on BBS/Blog and so on. Owing to this, this paper mainly research on the short-text which appears frequently on the BBS/Blog at present. A good many methods don't have an acceptable effect on the short-text information classification, that is, it can't classify the short-text effectively, so that the research on classification algorithm, which aimed at the short-text, has definite theoretical guidance significance. Owing to this, this paper brings forward a classification technology, which implements the information intelligence classification on Internet media based on the improved KNN algorithm, apply to short-text or disperse-text on internet, that is, mainly apply to the dapper main note and reply on BBS/Blog and so on.Followed, the traditional text classification is discussed. It turns back to the achievement in the field of text classification. It lists the basic research in the areas of text classification, text participle, text expression, feature selection and classification algorithm, etc. Moreover, it points out that research on text information classification will be an important development direction with the development of technology.On the basis of these research works, this paper makes detailed explanation to the classification algorithm. At present, a good many methods don't have an acceptable effect on the short-text information classification, Owing to this, through analyze the feature of the short-text seriously, this paper consider that the combination of classification based on the improved KNN algorithm and weight settings which based on semantic is one of the best solutions to this problem. By experimental verification, this solution is proved that it can do the short-text classification effectively, and then establish intelligence classification system.With the continuous development of the technology, we can believe that, it has momentous significance for the task of the surveillance and management of network and leading the public to use traditional text classification algorithm to carry out the intelligence classification of the short-text information that based on the natural language comprehension field.
Keywords/Search Tags:Short-text, KNN Algorithm, Text Classification, Natural Language
PDF Full Text Request
Related items