The emergence of semi-structured data has driven the development of heterogeneousdata integration in the enterprises. Its characteristics of modeless and self-describing couldbring great convenience in application. But confusional form between structure and data alsobrings difficulty in heterogeneous data integration. In the process of heterogeneous dataintegration, it has become one of the important research issues that quickly and effectivelydetermining the mapping between data item in the semi-structured data and data items in thestructured data.The paper uses the particularity of data element, analyzes the structure of data item in thesemi-structured data and data element in the structured data and proposes matching algorithmbetween data item and data element. The matching algorithm is based on levenshtein distancealgorithm and fused the thought of longest common subsequence, weight and backward focus.After realizing similarity calculation between data item and data element, we can realizematching between data item of semi-structured data and data item of database.The paper takes typical semi-structured data as example, analyzes large quantities ofExcel, understands the laws of data filling, and summarizes the common styles.With the helpof the notations in Excel, realizing the information extraction of Excel, including headers,data items, related data items and so on. After extracting a certain data item and related dataitem, analyzing and summarizing the laws of their composition structure, meanwhileanalysing and integrating the laws of their context relationship. Through studying theoreticalknowledge about data element and its application in oil field, analysing semanteme of dataelement and summarizing the laws of composition structure of data element. Studying thelaws of composition structure of Data item and data element and summarizing the laws ofhigh similarity.According to the laws of high similarity, designing mapping algorithm andcalculating similarity between data item and data element. Finally, the paper introduces therealization of the mapping system and proves correctness and feasibility of theory that thepaper poses, through taking the standard data element of Chinese Petroleum Company, EPDMdata dictionary and databases of five Service Company as experimental data. |