Font Size: a A A

Research On Entity Retrieval Based On Terms And Categories Information

Posted on:2020-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:T S YanFull Text:PDF
GTID:2428330596979671Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,the survey found that entity retrieval has gradually increased in the proportion of information retrieval,and it returns specific entity or entity attributes according to the user's query.Different from the traditional retrieval results presented in the form of web pages,entity retrieval can quickly and accurately return the list of related entities related to the query and related information,and the user does not need to perform secondary search,which improves the search experience.An entity is a uniquely identified object or thing(such as people,organizations,and places)that is characterized by its categories,attributes,and relationships with other entities.Entity features play an important role in retrieval,but the multifaceted nature of entity features can have a negative impact on retrieval performance.Based on the analysis of the characteristics of the entity,this paper combines the category information to search and improve the retrieval accuracy.The main research contents are as follows:(1)In view of the limitations of the semantic description ability of flat documents,this paper searches on structured documents based on the method of domain weighting.A hierarchical entity model is constructed by analyzing the internal features of the structured document.Considering the difference in the contribution of different domains to the topic in the document,the BM25F algorithm is used as the basic method for implementing entity retrieval.In order to solve the problem of the internal structure of the structured document,the DBpedia data set is used as the knowledge base,and the domain is divided according to different expression contents in the document set,and the combination domain is selected and the domain is weighted.Experiments verify that the introduction of document structure information can improve retrieval performance.(2)Based on the term-based retrieval method,entity type information is introduced to construct an entity retrieval model(T-CER)based on term and category information.The characteristics of category information are analyzed from three aspects:Firstly,a category similarity measurement method based on probability distribution is proposed.Secondly,the relevance level of the entity and the allocation category is represented according to the category hierarchy.Finally,the category hierarchy is defined and four types of taxonomy of different sizes were constructed.The entity retrieval model is generated based on the term similarity matching method and the category information similarity matching method.Assuming an idealized "Oracle" mechanism to provide the correct category of entity for a given query,the experiment verifies that the use of category hierarchy information can improve retrieval performance and compare different combinations of methods in the retrieval model.Experimental results show that Retrieving performance is optimal when using the strict filtering method of the most-specific category of information in the Wikipedia taxonomy.(3)In the actual search scenario,users are accustomed to the "single search box" paradigm,and asking them to annotate queries with types might lead to a cognitive overload in many situations.In order to solve this problem,this paper uses the LTR supervised learning method to automatically identify the target entity categories defined by the query.By analyzing the existing automatic identification category method,the category and query similarity and category label characteristics are analyzed,and 25 features for LTR category sorting methods are extracted.The experimental results show that the LTR-based category identification method automatically assigns a valid category to the query.
Keywords/Search Tags:Entity retrieval, Semantic search, Structured search, Entity categories, Query understanding
PDF Full Text Request
Related items