Font Size: a A A

The Research Of Keyword Extraction Algorithms On English Short Test Text

Posted on:2014-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:X J HanFull Text:PDF
GTID:2248330395498610Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The development of virtual learning platform is promoted by the education information. The technology of dynamic web page makes the interactivity and personalization of virtual learning platform to be possible. However, the contradiction between the vast amounts of educational resources and the limited learning time makes it to the new needs of intelligent virtual learning platform. The technology of text mining is the key to solve the contradiction between information explosive growth and effective use of information. Take the text mining as the guide, the paper aims to study the keywords extraction from the educational resources like short text, to tease out the potential pattern or structure.The paper studies the text mining theories of the text pre-processing and features item selecting, and analyzes some typical keyword extraction algorithms, like that KEA, PAT TREE and Genx, then proposed an adaptive keyword extraction algorithm. In the paper, the main content and innovation points are as follows:1. Because of the English examination texts usually have shorter content and flexible structures, the traditional keyword extraction algorithms are difficult to pick up the complete and effective feature items from the texts. This paper proposed an approach that extracts feature items from texts based on statistical models and semantics, which can ensure the completeness and validity of the feature items, using word frequency factor, location factor, word length factor and the word co-occurrence factor.2. Dynamic category of the item text causes that traditional keyword extraction algorithms are difficult to adapt to the different types of the texts. In the paper, through adjusting contribution of the feature weighting coefficients of the four feature factors in different texts, realize an adaptive feature weighting evaluation model.3. Optimize the adaptive feature wt ighting evaluation model, by adjusting the feature weighting coefficients using the generic algorithm, and through experiment build different models adapting to different types of English examination texts. Use multi-thread to improve the efficiency of the algorithm. The experiment demonstrates that the algorithm proposed in the paper improve keywords extraction precision rate ad recall rate for English short-test compared with TF-IDF algorithms and KEA algorithms.4. On the basis of algorithm proposed in this paper, keyword extraction component is designed and is applied on virtual learning platform of College English Test Band4.
Keywords/Search Tags:Text mining, Keywords-extraction, English-test-text, Genetic Algorithm
PDF Full Text Request
Related items