Font Size: a A A

Research On Web Product Indicator Extraction Based On Ontology

Posted on:2017-01-07Degree:MasterType:Thesis
Country:ChinaCandidate:G Z ShiFull Text:PDF
GTID:2308330488460598Subject:Software engineering
Abstract/Summary:PDF Full Text Request
People are increasingly concerned with product index along with overwhelming abundance of material wealth and rapid growth of various products. The data concerning product index should be obtained first to analyze it. Despite a large number of product data in network, it is not easy to obtain the required information from it rapidly. On the other side, given a great variety of products, a standardized approach of information extraction is also needed to satisfy all kinds of products. The ontology-based information extraction is able to solve the above problem effectively. First, as an abstraction of things, ontology can be constructed for any product index. Second, the knowledge structure of ontology facilitates the data analysis of product index. However, regarding information extraction of product index, the ontology-based information extraction still faces some technical problems, such as difficulty in constructing domain ontology and low utilization rate of ontology information. To find a solution, a simplified ontology model was proposed with respect to information extraction of product index. The way to obtain concept, conceptual relation and conceptual property of ontology was also designed, thus reducing the necessity of involving domain experts. In terms of information extraction, a path-template information extraction method guided by ontology was proposed to make the most of the knowledge information in the simplified ontology model.The research efforts were made in the following aspects:1. Regarding domain ontology, to reduce the involvement of domain experts, a simplified ontology model for information extraction was proposed in consideration of the information characteristics of Web product index. Such ontology includes the basic conceptual structure of ontology, and also reduces the complexity of ontology and the difficulties of domain experts at work. Therefore, the model makes it possible for general users who know domain to construct the ontology that meets the demand of information extraction.2. Regarding knowledge acquisition of ontology, an extraction method of domain concept based on multi-strategy decision was proposed. Word segmentation, screening and statistics were conducted using a set number of sample documents and reference documents. Four strategies, namely DC, DR, TF-IDF and NC-Value, were applied to calculate the weight of words, which was then used to determine conceptual words of domain in a comprehensive way. Meanwhile, an improved K-means algorithm was adopted to acquire the hierarchical relation between concepts automatically from domain documents.3. Regarding information extraction, a path-based information extraction method guided by simplified ontology was proposed. This method positions the information to be extracted through the ontology information, saves its path as a template, and obtains necessary information by virtue of the mutual participation of ontology and template.4. At last, smart phone was taken as an example of product, and the information extraction system was tested. The results indicate that the simplified ontology supports the information extraction of product index, and the proposed extraction method performs well in terms of accuracy and recall rate of extraction.
Keywords/Search Tags:Information extraction, Ontology, Concept extraction, Extraction rule
PDF Full Text Request
Related items