| The enterprise saves the customer’s feedback when using the product in the form of metadata(O&M work order).This paper takes a state-owned enterprise in East China as an example,extracts and cleans the metadata,obtains an effective FAQ set and answers matching the FAQ set,and applies the Hadoop ecology component to the enterprise operation and maintenance with a relatively closed and large amount of data for the professional field.This paper focuses on improving the problem similarity calculation model and the question pre-classification method,and build an automatic question-answering system for the operation and maintenance professional field so that the operation and maintenance knowledge can be reused and the user can access to services by himself gaining knowledge related to business systems and improve the quality of enterprise operation and maintenance services.The main research contents are as follows:First,the design is oriented to the domain-specific automatic question-and-answer framework,and the traditional process is improved to calculate the similarity between new questions and historical questions.And it will quickly match historical questions,and update the problem database automatically.Second,when the text is subject to domain-based word segmentation preprocessing,the unregistered words cannot be recognized resulting in poor word segmentation due to the professional field.This paper improves the segmentation accuracy by making professional domain dictionaries.After pre-processing of the word segmentation,de-stopping words,etc.,the keyword expansion is performed and the word weight table is created to calculate the question similarity.Third,the question similarity calculation model is improved from the syntactic and semantic aspects by the analysis of the question.And the influence factor of the question length and word order are added.Semantics on the similarity calculation of the question is also considered.At the same time,the improved question classification method based on the new question similarity calculation model can avoid the classification error caused by the uneven distribution of samples.It also verifies the validity of the above question model and classification method by using experiments with different experimental data,and analyzes the impact on pre-classification of question to entirety.Fourth,system apply the components of the big data ecosystem for the actual situation of the data generated by the enterprise operation and maintenance system.Using the professional dictionary,question similarity model and classification method designed in this paper can motivate to analyze and process the metadata efficiently.Eventually it realizes automatic operation and maintenance. |