Font Size: a A A

Web Concept Mining Based On Text Layer Model

Posted on:2003-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:C Z ZhangFull Text:PDF
GTID:2168360065462212Subject:Agricultural economics and management
Abstract/Summary:PDF Full Text Request
To improve the performance of web text mining tools, this paper try on using automatic indexing and automatic classification techniques , data mining technology , pattern recognition technology and mathematical statistics method to create a practical model , i.e. Text Layer Model , and it can extract information from three kinds of data on the Internet. The significance of this paper is as follows: providing a new method to create the knowledge database used for automatic classifying, providing the location weighting algorithm for information extraction, presenting a new methods to improve the performance of Chinese recognition of synonyms and unregistered words.The creating of the knowledge database used for automatic classifying is base on data mining technology and mathematical statistics knowledge. We use the Dice measure, support degree and confidence degree to create four kinds database of different dimensions through different thresholds of correlation degree and interesting degree. Lastly, we select one of database through the test by concept mining system.To distinguishing the subject expression ability of different parts of text, including 1800 Web pages, we have a investigative statistics and providing the location weighting algorithm for information extraction.To enhance the ability of the recognition synonyms, we use the synonyms dictionary as the semantic system and providing the new algorithm of recognition synonyms base on the synonyms dictionary. We use this algorithm to calculate the similarity degree among the words and match the subject in the automatic classification.We provide a new method to enhance the ability of mining the unregistered words, i.e. recognition method base on the character or word expanding. Different from the N-Grams Model, this method uses the location information of the text to recognize unregistered words.At the end of the paper, we test and evaluate concept mining system, the deficiency of systems is also detailed objectively..
Keywords/Search Tags:web concept mining, text layer model, knowledge database, recognition of synonyms, recognition of unregistered words, automatic indexing, automatic classifying
PDF Full Text Request
Related items