Font Size: a A A

Research And Implementation Of SALmap Method-based Attribute Extraction

Posted on:2011-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y ZhangFull Text:PDF
GTID:2248330395457380Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of computer science, the Internet and other new media, the amount of information extend exponential growth, how to find timely and accurate information we needed from a mass of the document pages, has become a problem to be solved immediately. Traditional search engine had a greater challenge on the performance of users experience. In this situation, Internet entity attribute extraction technology become important and developed. Attribute extraction technology has wide applications. It can be applied in the specified attribute information extraction of the related entities after information retrieval, changing the information search process into information understanding process, so the traditional information retrieval system become more intelligent system for users to output the information, and in a more satisfactory way. Information extraction techniques can also be applied to some research field, such as data mining, question answering system, and these areas complement each other for common development.We first describe the concept of attribute label map-based internet entity attribute extraction, its architecture and key technologies. To extract valuable entity attribute information from a web page of some certain entity type, or product, then provided to users. As the web entity attributes labels are synonymous to identify, constructing the map between attributes and labels is the first step in our study, this paper presents SALmap method:it defines data format rules by regular expression, using seed method to generate candidate attribute label set, then constructing attribute label map using the maximum entropy; meanwhile, based on the attribute label map, we can annotate entity instances, at last we use hidden markov model to extract entities with associated attributes. In the experiment, we improve related algorithms of hidden Markov models aiming at our system, it helps to improve the accuracy of the model input parameters, learning ability and attribute extraction accuracy.In order to evaluate the performance of every component of the system, the improvement as the experiment parameters changes in the whole system. We proposed an attribute label map method to build an internet entity attribute extraction system based on Java and Eclipse framework, the model is a domain-independent, unsupervised learning framework. it improves the portability of the system framework.Finally, combined with the practical application, we proposed SALmap method, by evaluating the system performance, we confirmed that SALmap method is effective and can significantly improve the attribute extraction performance of the web page.
Keywords/Search Tags:SALmap method, attribute extraction, HMM, maximum entropy model, performance evaluation
PDF Full Text Request
Related items