Font Size: a A A

Research And Implementation Of Text Classification System Based On VSM

Posted on:2006-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:Z G ChenFull Text:PDF
GTID:2178360182476554Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Internet is a milestone in the history of science and technology at the end of 20thcentury, and urges our society to the information times that centers on the network.With the explosive growth of web information, people have difficulty in finding therequired information from massive information, which is called "informationconfusion". To locate the required information, text classification gradually seemed tobe more important.Vector Space Model (VSM) is the popular models for large scale of textprocessing. We discuss the key techniques of text classification based on VSM,including text representation, basic concept of VSM, feature extraction and selection,and text classification.Traditional vector space model cannot discriminate the importance degree of thedocument terms at different positions and their expression ability to the documentcontent. By analysing the problems above, this paper modifies the formula of termweighting. The theoretic analysis and the experimental results show that the newalgorithm improves the performance of VSM greatly.For the methods of feature extraction, this paper comprehensively takes indexessuch as frequency, distribution and concentration into account and proposes a newalgorithm of feature extraction, which enables the selected feature terms to reach tooptimisation among these indexes.We implement automatic text classification system based on the class-centermethod and validate its feasibility. On the basis of the above system, this paper furtherimplements a text classification system with the two-level mode. The experimentalresults show that two-level classification mode has highly improved the recall andprecision compared with the class-centre method.Concept Space can describe inner relations between texts and mine deep corpusstructure. This paper introduces a concept, Concept Space, and builds a method fortext description and dimensional reduction in the Concept Space. With the orthogonaland feature extraction properties in the Concept Space, this system shows a goodperformance in filtering noise and decreasing dimensions.
Keywords/Search Tags:Text Classification, Vector Space Model, Term Weighting, Feature Extraction, Two-level Classification Mode, Concept Space
PDF Full Text Request
Related items