Font Size: a A A

To Retrieve Key Technology Research. Chinese Experts

Posted on:2012-12-21Degree:MasterType:Thesis
Country:ChinaCandidate:L N LiFull Text:PDF
GTID:2218330368980926Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
Currently, expert search is the popular field of vertical information retrieval domain. In 2005, the expert search task in international famous retrieval conference TREC is defined as:given a query topic, requires to return and rank the expert list and detail expert information from database. This paper focuses on researching and investigating the critical implementation process of Chinese expert homepage recognition, Chinese experts unstructured web data extraction and expert database self-organizing. The emphasis is as follows:(1) Proposed a method of entity homepage recognition based on J48. In detail,2113 Chinese expert entities and the corresponding entity homepages are collected by analyzing the expert resources, we define the expert entity features related to the features of link and webpage content, besides, these features are extracted to form a training data set; And then adopting different learning algorithms with different features to recognize the expert homepage to find the most effective classification features and homepage recognition learning algorithm. The experiment results show that the best method has been achieved by using J48 algorithm, specifically, combined with the features of link and webpage content, the expert homepage recognition accuracy rate reaches 81.05%.(2) Proposed the automatic template detection method based on the similarities and differences among HTML tags. For the characteristics of list expert homepage and document expert homepage, with the help of lattice theory, automatic mining the data template wrapper behind web pages by using the features among HTML tags, and then locating the data area to obtain the Chinese expert web page unstructured information.(3) Proposed a Chinese expert database self-organized method for solving different types of data fusion. For list expert data and document expert data, building a expert database and addressing the data fusion problem, in addition, cutting data to optimizing the database. At the same time, in the usage process by users, completing the expert data through search optimization, editing and adding. (4) Constructed Chinese expert search experiment platform, and we designed and implemented prototype system.
Keywords/Search Tags:Chinese expert search, List webpage data extraction, Document webpage data extraction, Expert database self-organization
PDF Full Text Request
Related items