Font Size: a A A

Text Mining Technology In The The Wenzhou Baike Site Construction Applications

Posted on:2011-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:J SuFull Text:PDF
GTID:2208360308475773Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text mining is one of the key contents of data mining. With the rapid development of Internet technology, volume of e-documents and emails is exploding, and large-scale text processing has become a challenge. How to acquire useful information from Internet and classify the information correctly has been a key issue that has to be solved immediately in the information system discipline.The project thesis related adopt Windows (Linux) as the operating platform, Apache as the Web server, MySQL as the back-end database, using PHP as a development language, with the operation of Mediawiki as Wenzhou Encyclopaedia application engine.The profile of Wenzhou Encyclopaedia, characteristics and system framework of the construction process on the site has been introduced. Starting from the concept of text mining, thesis describes the main technical and active information, information retrieval, patent information system, the concept of text classification and applications. Describes the construction of Wenzhou Encyclopaedia corpus, and discuss the text classification algorithm, the realization of text classification model construction on Wenzhou Encyclopaedia corpus, and a more detailed comparison and analysis are given. In final, the conclusion that the classification algorithm for the auxiliary classification on the known corpus has some effects is given.
Keywords/Search Tags:System construction, Text mining, Text classification, Algorithms, Dictionaries, Models, Application
PDF Full Text Request
Related items