Font Size: a A A

English Entity Answer Extraction And Home Find

Posted on:2011-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y B XuFull Text:PDF
GTID:2208330332978733Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Entity answer finding, one key point of Q&A system and information extraction, is an important task of entity search. In TREC 2009, the entity task requires extracting relevant answer and entity homepage from Internet and relevant data set by utilizing entity property, type and the context of entities. Therefore, how to use the natural language information effectively to retrieve text, passage, and answer has become a core issue. This paper focused on researching and investigating the critical implementation process of English entity answer extraction, such as query expansion, passage segmentation, text and passage relevance calculation, named entity recognition, answer entity extraction, answer extraction base on table and homepage finding. The emphasis is as follow:1. Put forward a method of entity answer extraction about TREC entity task. This method considers text, passage and entity relevance related to answer. In detail, the text relevance is the similarity between the title in webpage and the query; the passage relevance indicates the similarity between sentences in paragraph and the query. The entity relevance shows the score of distribution density referred to entities and query words in the passage. The synthesized score of entity answer will be obtained through a linear combination of the above three scores. Then we extract the entity owned the highest score as the final answer. The experimental results of entity task in TREC 2009 show that the method has a good effect, and the NDCG evaluation reaches 0.30.2. Provide an extraction method of table entity answer in TREC. For low precision of entity recognition due to lack of context, we combine the title of table and elements in table to extend context of entity recognition by using features of table and label in webpage. Besides, considering probability and statistics of entity recognition in the relevant text, this paper has dealt with entity recognition of all the elements in table, combined with the score calculation method of entity answer extraction, which has achieved a better result.3. Propose an entity recognition method based on AdaBoost. A number of entities and the corresponding entity homepages have been collected manually. For entity feature, we define features related to links and Webpage contents, and these features are extracted to form the training data set. We recall homepage related to entity through the Google search. The paper has used AdaBoost method to recognize homepage, and this method shows very good result.4. Design and implement prototype system, and we have conducted the test in Entity Track of TREC 2009.
Keywords/Search Tags:TREC Entity Track Task, Text Relevance, Passage Relevance, Entity Relevance, Entity Answer Extraction, Table Entity Answer Extraction, Homepage Recognition
PDF Full Text Request
Related items