Research On Defect Text Of Capacitive Equipment Based On Data Mining

Posted on:2021-03-30

Degree:Master

Type:Thesis

Country:China

Candidate:L Xie

Full Text:PDF

GTID:2428330611455271

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

Once a defect occurs in the capacitive equipment,especially the type of defect belongs to “emergent” or “major”,it may bring great interference to the normal operation of power grid,even huge losses.Therefore,mining defect text data of the capacitive equipment to find out the accurate information when the defect occurs in the capacitive equipment.It is of great significance and value to predict the occurrence time and the types of defect for the capacitive equipment.The defect data of capacitive equipment is mainly text data from the daily operation and maintenance records of power grid enterprises,many fields are described in natural language.The description of these fields is not standardized.The clerk has a strong arbitrariness to input the defect description.The length and content of defect text input by different personnel may also be different.Thus,it is a great challenge to mine the defect text.The defect text of capacitive equipment often has these features such as high complexity,large amount of data,and difficulty in processing.The thesis focuses on the mining of defect text.And our research tasks and results are listed below.(1)In this thesis,the term frequency-inverse document frequency(TF-IDF)algorithm is used to encode the defect text.The vector dimension of each defect text sample after encoding is 10675.However,after encoding,the capacitive defect text with the largest number of words has only 136 words,in the capacitive defect text vector with the largest number of words,that is the valued elements only account for about 1.3% of the total vector dimension.After encoding the defective text with TF-IDF,the defect text features are very sparse.Therefore,this thesis uses a non-negative matrix factorization algorithm to reduce the dimensionality of the TF-IDF encoded defective text.(2)For the shortcomings of TF-IDF,we also used feature expansion algorithm to encode defect text.That is,only the words with high depicting ability could be selected from the 10675 words.The selected words were considered as the feature sapce of the defect text.Thus,the dimensions of vector were reduced.Then,the sample were expanded through the mutual information between these words.(3)Based on TF-IDF,TF-IDF non-negative matrix factorization and feature expansion approaches,we used k-means clustering and hierarchical clustering methods to cluster these preprocessing defect data sets respectively.The expermental results showed that the k-means clustering approach achieved the best performance with the TFIDF non-negative matrix factorization approach,the silhouette coefficient is 0.92,and the number of optimal categories is 163.(4)Based on the optimal clustering model,we used naive Bayes classifier,random forest,and bidirectional encoder representation from transformers(BERT)to classify the original defect text and those with feature processing.The experimental results showed that all the three methods could effectively improve the performance after feature processing.Among the three classifiers,the BERT achieved the best performance after feature extension,and the classification accuracy was increased from 0.98 to 0.99.The counterparts of naive Bayes and random forest were improved from 0.74 and 0.86 to 0.78 and 0.88,respectively.(5)After classifying the defect text,we extracted the knowledge triples based on dependency analysis approach,and the Neo4 j was selected as the database to store and search the defect text of capacitive equipment.

Keywords/Search Tags:

Mining of defect text, BERT, Feature extension, Knowledge triplet, Capacitive equipment

PDF Full Text Request

Related items

1	Short Text Classification Based On The Model Of Knowledge Graph And Word Combination
2	Research On Sentiment Analysis Method Based On Short Text Feature Extension And Fused-KNN Algorithm
3	Defect Detection Method Based On Machine Vision Capacitive Screen Non-visible Area Leads
4	Research And Realization Of Domain Knowledge Graph Construction Method Based On Text Mining
5	Research On Extension Classification Knowledge Mining Based On Interval Manner
6	Semantic and Association Rule Mining-Based Knowledge Extension for Reusable Medical Equipment Lifecycle Managemen
7	Extreme Short Text Classification Based On Knowledge Graph Features Extension
8	Research On Text Multi-Feature Classification Algorithm Based On BERT-LSTM
9	Research On Product Patent Design Knowledge Extraction Technology Based On Text Mining
10	Research On Text Summarization Method Based On BERT Model