Research And Implementation Of Automatic Text Classification System Of Biomedical Informatics

Posted on:2015-02-14

Degree:Master

Type:Thesis

Country:China

Candidate:S Wang

Full Text:PDF

GTID:2298330467470415

Subject:Epidemiology and Health Statistics

Abstract/Summary:

PDF Full Text Request

Objective:With the development of computer and network technology, an enormous amount of biomedical information accumulates in the internet. The characteristics of enormity, wide distribution and variety of the biomedical information make it increasingly difficult for users to make effective use of. Facing with such a huge amount of online biomedical information, it is necessary and pressing for people to effectively find and select the information to their needs. As a result, on the basis of research into Automatic Chinese Text Classification, the author aims to develop an automatic text classification system for automatic and accurate classification of numerous biomedical informatics so as to provide users active, timely and useful help, improving their work efficiency and giving important reference for decision-making. Methods:During the system development process, the most popular vector space model (VSM) was applied in the text representation model; visual studio2010served as the development platform; C++language was the programming language; Interface design was finished with Qt. ICTCLAS segmentation system of Chinese Academy of Sciences was used in prototype system building. TF-IDF measure was used for calculating word weight. Feature selection approach was IG. The k-Nearest-Neighbours (kNN) was the classification approach. Classification results were evaluated by MacroP, MacroR, MacroFl and MicroFl. The first writing of the system was done in accordance with various ideological prototype algorithm and then it was improved to eliminate the defects and shortcomings of the algorithm. A comparison was made between the prototype system and the improved one and the discussion was also made. Due to the lack of ready Chinese biomedical corpus, the corpus was a self-made one when tested and trained for classification system. Result: Some adjustments and improvements were made in the TF-IDF weight calculation features and KNN classification algorithm in the development of the system. And, TF-IDF-DF, a new selection method based on TF-IDF, was proposed. Conclusion:The proposed new selection approach proves better IG feature selection after testing and validating, improving the performance of classification system. It can classify biomedical information rapidly and accurately and provide help for organizing and retrieving biomedical information. A combination of the classification system with search engine can present quick, accurate and timely classified information for concerned users.

Keywords/Search Tags:

Automatic text categorization, TF-IDF, Feature Selection, Classificationalgorithm, IG, KNN

PDF Full Text Request

Related items

1	Multi-class Scientific Literature Automatic Categorization System
2	Studies On Some Essential Problems In Automatic Text Categorization
3	Research On Automatic Text Categorization System Based On Neuron Network
4	The Research And Implementation Of Automatic Text Categorization For Chinese Web Documents
5	Research And Implementation Of The Automatic Chinese Text Categorization
6	Research And Implementation Of Chinese Text Categorization Methods Based On Tree-like Keywords Set
7	The Research Of Text Representation And Feature Selection In Text Categorization
8	Design And Realization Of Automated Text Categorization System For Chinese Documents Based On Relevancy
9	Theoretical Analysis And Algorithm Study On Feature Selection For Text Categorization
10	A Study On Text Categorization Based On Machine Learning