Font Size: a A A

Electronic Document Information Mining System

Posted on:2004-06-07Degree:MasterType:Thesis
Country:ChinaCandidate:L J CaiFull Text:PDF
GTID:2208360092990604Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the surprising growth of Internet and its information service, data mining (DM) technology has been successfully used in data base, Which makes it possible for people to make a study of Web information mining, especially Web data mining.Beginning with the introduction of the definition of DM, its function, model and arithmetic, the paper also makes a study of its background, technology evaluation and its present situation. Then it describes the framework of DM system, focusing on the analysis of three most common Web DM technologies. Because Web daily record mining model is of great deficiency: such as low accuracy, high cost and inefficiency, it is unfit for electronic documents. Vector space model (VSM) as well as document filtration based on sample leaning is actually a way of documentary comparison and model filtration, in this way vector dimensions as well as their arithmetic cost are very huge but ineffiently. It is ineffective while handling indefinite things, for deviation may appear while estimating key words. Finally the paper proposes a practical electronic documentary information mining system as a solution, it is very complicated to set up a data base of the same pattern on Internet because of various types of documents and languages. Inverse to traditional data mining process, this paper uses a method of establishing mirror image sites of Internet service. That is , once electronic documents are mined up, a base is set up again for the documents useful to users in order to increase their ability and speed of handling information. Employing IDEF to establish framework, dynamitic and functional models, the system also designs a non-back shifting search arithmetic for double-scanning buffer zone and a double-track structure for searching process. According to the characteristics of E-mail control and electronic documentary mining technology, Bayes classifiers are made to strengthen the electronic control system in which electronic documentary mining technology is used; and moreover the double systematic structure of C/S & B/S is constructor with the presence of some function relationships in mining process as well as systematic mining and program handling. The system has the function of mining, issuing, managing electronic files, E-mail control and systematic safeguard.
Keywords/Search Tags:DM, Electronic document, Web daily-record mining, VSM, I2DEF Method, Non-back shifting search Arithmetic, Double-track Structure, e-mail monitoring
PDF Full Text Request
Related items