Font Size: a A A

Research On Text Mining Application For Supervision Engineering

Posted on:2019-03-21Degree:MasterType:Thesis
Country:ChinaCandidate:C Y YangFull Text:PDF
GTID:2428330545488404Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the continuous advancement of enterprise information engineering construction,the computer management information system replaces the traditional paper archive management mode and realizes the paperless office.From the perspective of information resource management of construction supervision unit,the most precious asset is the text that records all kinds of information,and there are a lot of valuable electronic documents stored in the information management system of the supervision unit.While these texts are diverse and disorganized,they need to be stored in an orderly category according to the categories specified in the document.However,the traditional processing method is manually annotated category,which has problems such as insufficient manpower,insufficient processing,statistical results and reality.In order to solve the construction supervision unit in the face of a large number of text data can achieve fast and effective classification,solve the lack of human resources,heavy workload and low work efficiency.In this paper,on the basis of in-depth study of the relevant technical theories of text mining,especially the text classification technology,the automatic classifier for supervision engineering text is designed to improve work efficiency.Firstly,the paper analyzes the importance of supervision engineering text through field research and consulting relevant data,and summarizes the existing problems of existing management methods.Secondly,relevant theoretical knowledge of text mining technology is studied deeply,and the text classification technology is studied in detail.Then,the existing problems of Chinese word segmentation are summarized,in view of the problem that the unregistered word cannot be identified,the related professional glossary in the field of supervision engineering is sorted out.Then,the TF-IDF algorithm is analyzed,and the TF-IDF algorithm based on title and body is proposed for the different weight size of special words in different positions in text.Finally,the naive bayes classification algorithm is introduced,the Bernoulli model and polynomial model are compared and analyzed,beause the traditional naive bayes did not consider the difference of the influence of different special words on classification,the improved weighted naive bayes is proposed.On the basis of the above research,the text classification system for supervision engineering is developed in Java language.Finally,it carried out the experiment with thesupervision notification data set provided by the third ring supervision consulting company,the three experiments of feature selection,weight calculation and classification algorithm are respectively carried out.Experiment with feature selection,the feature selection algorithm which is used for the supervision notice effect is defined as the chi-square statistics;Experiment with weight calculation,the value of parameters introduced into the TF-IDF algorithm is ? =1.2,? =0.8;Experiment with classification algorithm,the classification results obtained by the weighted naive bayes classification algorithm are verified,it is 2.7 percent higher on the average,1.4 percent higher on the search rate and 2 percent higher on the F1 average.The experimental results show that the method used in this paper can be used effectively to classify the supervision notice,which has practical value.
Keywords/Search Tags:supervision engineering, Text categorization, TF-IDF, Naive bayes, Supervision note
PDF Full Text Request
Related items