Font Size: a A A

Research On Content Extraction Technology Based On Unstructured Power Data

Posted on:2024-04-05Degree:MasterType:Thesis
Country:ChinaCandidate:T J LiuFull Text:PDF
GTID:2542306941954019Subject:Master of Electronic Information (Professional Degree)
Abstract/Summary:PDF Full Text Request
For a long time,a large amount of electrical data has been accumulated in the processes of operation,repair,and testing of electrical equipment.Compared to structured data,which is mainly stored in the form of key-value pairs,unstructured data exists mainly in the form of text,images,audio and video,and has a wider range of application scenarios and richer value information.However,due to the limitations of storage methods,most unstructured data does not play its full role in supporting the operation and decisionmaking process of electric enterprises.With the development of China’s power grid intelligence and information technology,exploiting the potential value of unstructured data in the electric power field has become one of the hot spots of current research.Among these,the content extraction technology of unstructured data has become the key technology to solve the problem of big data extraction in electric power.Therefore,this paper takes two perspectives of image data and text data in unstructured power data,and takes substation wiring diagrams in electric power system and test forms of power equipment as research objects.The research on content extraction technology of these two types of typical data is carried out.Content extraction of wiring diagrams for substation in electric power system.First,an electrical component extraction algorithm based on the improved YOLOv5 network is proposed.On the basis of the original network,image sliding window segmentation,fusion of multi-scale features and anchor frame optimisation are used to optimise both the data and algorithm dimensions.The improved algorithmic model achieves extraction accuracy of 93.0%and FPS(Frame Per Second)of 52.4.Information extraction of text,connection lines and topological relationships in wiring diagrams was then achieved using a combination of deep learning and image processing technology.Finally,a variety of comparison experiments were designed to verify the effectiveness of the algorithm for content extraction in this thesis.The experimental results show that the algorithm can not only extract the content of wiring diagrams accurately and completely,but also provide an effective reference for digital storage of drawings.For the content extraction of test forms for electrical equipment.In this paper,an improved EDLines extraction algorithm is proposed based on document features and image processing technology.First,multi-format power equipment test documents are preprocessed differently and form regions are extracted.Then the form lines are extracted based on the improved EDLines algorithm,and the detection of straight lines is improved by setting threshold parameters to solve the problems of short line interference and oversegmentation.The cells are then divided according to the detection results of the table lines.After comparison experiments,the accuracy of the improved algorithm reached 94.36%and consumed 97ms.Finally,the text content in the cells was identified,completing the extraction of the content of the power equipment test forms and laying the foundation for efficient retrieval of unstructured data.Based on the above algorithm research,a content extraction system for unstructured power data was designed using the Django framework.The system includes three functions:document management,content extraction and data export,which are convenient for staff to use.
Keywords/Search Tags:unstructured data, extraction of wiring diagram of the electric substation, improved YOLOv5, extraction of the test form
PDF Full Text Request
Related items