Font Size: a A A

Research On Several Problems In Text Retrieval

Posted on:2007-09-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:X J WangFull Text:PDF
GTID:1118360185467803Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Information Retrieval technology (IR) aims at recognizing and acquiring information from the set of information, and plays an important role in our study and scientific research. Especially in today, the Internet is applied more and more widely, and the quantity of information increases sharply. Infromation Retrieval technology has become an efficient approach for people to develop and make use of all sorts of information resources effectively, to acquire and absorb information fleetly and roundly. The research of the present thesis involves in related technologies on information retrieval such as document processing, text classification and query optimization etc. The following are achieved results in this dissertation:1. Feature selection in text classificationIn the thesis, we introduce the concepts of absolute reliability, relative reliability and compositive reliability and set forth the feature selection algorithm based on mutual information reliability. The algorithm combines the correlativity between a term and the class and the difference on the term among all the classes, i.e., the reliability of the maxium mutual information among classes. Experiments show that compared to the basic mutual information function, the algorithm based on mutual information reliability can improve the precision, recall and F1 measures effectively. Further more, we also apply normalization to seval traditional functions or make local feature selection based on these functions. Experiments show that normalized feature selection and local feature selection can improve the classification precision more or less.2.Muticlass classificationIt is common to set a threshold for each class in order to settle the problem that a text may belong to different classes. When the similarity of the text and one class is above the threshold of this class, then the text is classified to this class. In this thesis, we research on the determination of...
Keywords/Search Tags:information retrieval, text image, drop fall algorithm, text classification, feature selection, query optimization, query expansion, relevant feedback, mutual information
PDF Full Text Request
Related items