Font Size: a A A

Text Filtering Key Technologies

Posted on:2004-07-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y J XiaFull Text:PDF
GTID:1118360095962827Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
More and more information sources are now available in machine-readable form due to the rapid development of communication networks and inexpensive massive storage. For a special user, the information needed is relatively small. Since most of the data are massive in scale and diverse in subject areas, they make the information acquisition bottleneck more severe, thus greatly challenge the processing systems in speed, precision and robustness. In order to find useful information embedded in electronic form, efficient and effective techniques for large scale real text processing have become one of the most urgent demands.This dissertation focuses on the key techniques of adaptive text filtering. We have designed and developed an experiment platform. Based on that we took part in the Filtering track of Text REtrieval Conference (TREC10 and TREC11) and obtained very good results. In TREC11, We are selected for the first of only three speaking slots on Filtering track. We also have developed several systems, including "Chinese text filtering system" and "Web-Based Trend Search System".Vector Space Model (VSM) is used to represent text. There are two principle problems about VSM: term selection and term weighting. Words, concepts, and terminologies are selected as terms, while term weights are calculated with statistical information and heuristic rules. We have applied the WordNet to filtering system, used the semantic information and tried disambiguation. We have developed the interface of HowNet and applied it to Chinese filtering system by using its conception information. This method has enhanced the system's performance while lower the dimension of vector greatly.Machine Learning in adaptive text filtering includes profile learning and threshold learning. We make research on threshold learning. In TREC10, we presented a novel threshold-adjusting algorithm. This algorithm adjusts the threshold fast and efficiently using a small number of samples.In TREC11, we presented a novel method that uses a winnow classifier building from these words to assist the text filtering system. This method can enhance the system's performance greatly.
Keywords/Search Tags:Adaptive Text Filtering, Text Feature Extraction, Vector Space Model, Machine Learning, Natural Language Processing
PDF Full Text Request
Related items