Font Size: a A A

Research On Web Resources Based Construction Of Bilingual Dictionaries And Query Processing

Posted on:2015-11-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:J HanFull Text:PDF
GTID:1108330476955916Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Since the invention of the World Wide Web(Web for short), the Web resources based technology, which means solving the scientific and technological problems for the Web by the assistance of available resources from the Web, has effectively promoted the development of Web technology, such as data extraction, construction of knowledge base, web search, social network, etc. In such context, we conduct researches on automatic building of bilingual dictionaries based on Web page resource and query processing based on Web knowledge bases and web crowdsourcing. The contributions of this thesis are as follows.1. Construction of bilingual dictionaries based on Web pages. There exist billions of Chinese-English bilingual Web pages in Chinese Web and it is an interesting research to construct bilingual dictionaries using these pages. We propose a data-centric method to construct bilingual dictionaries based on Web pages which with no use of any pre-built resources. Our method is based on statistics of bilingual entries and has linear complexity. It overcomes the performance bottleneck of machine learning algorithm while handing mass of web pages, and can achieve better accuracy and coverage.2. Semantic-enhanced spatial keyword search based on Web knowledge bases. Traditional methods for spatial keyword search have the limitation that they only consider textual relevance of POIs(points-of-interest) to query keywords, and neglect the semantics of queries. To address this problem, we introduce a semantic-enhanced spatial keyword search method, named S3, which uses knowledge bases to help capture query semantics and introduces a ranking scoring function that considers both semantic distance and spatial distance. For the instant search on large-scale POI data sets, we also devise a novel index structure GRTree, and develop effective pruning techniques based on this structure.3. Query structure interpretation based on crowdsourcing. The existing methods for query structure interpretation require the analyzed queries are classified to the target domain and thus they are limited to interpret noisy queries in real query logs. To address the problem, we propose a human-machine hybrid method by utilizing crowdsourcing. Our method selects a small number of query terms and asks the crowdsourcing users to interpret them, and then infers the interpretations through the probabilistic inference method based on a similarity graph of query terms. To further improve the performance within the given budget, we propose a method to select the most useful query terms for crowdsourcing by considering their domain-relevance and error-reduction abilities.
Keywords/Search Tags:Web Resources, Bilingual Dictionaries, Spatial Keyword Search, Semantic, Query Structure Interpretation
PDF Full Text Request
Related items