Font Size: a A A

Research And Realization Of Text Mining System For Project Files

Posted on:2008-01-13Degree:MasterType:Thesis
Country:ChinaCandidate:F SongFull Text:PDF
GTID:2178360215980807Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of computer applied technology, the amount of electronic files grows rapidly. Various forms of electronic files contain abundant information. But since the files are stored in unstructured or semi-structured forms, it is impossible to use the traditional data mining method on them. Under such circumstances, text mining technology has become a new hotspot. Feature extraction, feature representation and feature selection are the first steps of text mining to get structured data representation from files. And then knowledge mining methods, such as text classification, text clustering, correlation analysis, distribution analysis and trend forecasting, can be used to find useful information. The study objective of text mining is file which is the most common and natural form of information storage, so the research of text mining has very broad prospects.Currently, text mining has been mainly used in the field of internet, such as website classification, related search, spam filtering and so on. However, in the management information systems of enterprise which has a large amount of technical documents, the application of text mining is very little. To address this issue, according to the characteristic of files used in project and the need of application, a text mining system for files used in project are developed, which is derived from the file management module of the Intelligent Decision Support System for Project Bidding. This system achieves the management of files according to the structure, and performs in-depth mining to text using the algorithm proposed in this paper.Firstly, nonnegative matrix factorization algorithm is used to classify files and extract the name of each type. Feature representation space based on concepts is obtained at the same time. And then, Text Knowledge Mining Algorithm Based on The Comprehensive Information Space of Concepts is proposed. In this algorithm, Comprehensive Information theory is used in concept space module to measure the information carried by concept. Compared with the simplex semantic knowledge mining algorithm, the comprehensive information of features computed by the algorithm could provide more complete reference for the decision making process. Secondly, according to the method of matter-element theory, files are managed according to the inner structure and Numerical Structural Information Extracting Algorithm On Simple Textural Knowledge Element is proposed. And then, the information space obtained is extended considered to the extension character of matter-element. Lastly, a text knowledge mining system with friendly interface is developed which has imitated the three bases structure contained data base, module base and knowledge base. The system is developed under the circumstance of Microsoft Visual Studio.Net 2003 and SQL Sever 2000.
Keywords/Search Tags:text mining, project files, nonnegative matrix factorization, comprehensive information space of concepts, matter-element, numerical structural information
PDF Full Text Request
Related items