Font Size: a A A

Research On Ontology-based Enterprise Search Optimization

Posted on:2013-02-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:C C LiuFull Text:PDF
GTID:1118330371482711Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The grown up of Google, BaiDu, etc brings internet users all around the world into theinformation retrieval era, which makes people acquire information they need from the web ina very fast and easy way. At the same time, enterprise information is experiencing anexplosive growth leading efficient enterprise information acquiring services to be theimpending needs of corporate users, and users need to obtain valueble bussiness informationfrom the vast amount of data lied both in and out of enterprise, to support them in their dailydecision making and help to improve their work efficiency. So enterprise search comes out,which tries to integrate information from multiple sources and provide satisfactory searchservices. Enterprise search has some new features comparing with web search. For example,the search objects it supports can be structured, unstructured or semi-structured data, and thedata can be in the form of pdf, xml, txt, tuples, etc; its search range changes from internalwebsites, electronic text held by organisation, to external websites; the information it searchesusually focus on concrete domain, not on open domains as in the internet. In addition,enterprise search requires the search result to be more accurate, and it has to support morekinds of searh modes, and etc. All the above mentioned features decide that enterprise searchcan't reuse the same techniques used in Web search as they are, and researchers have todevelop some new techniques fitting to the enterprise search environment.Ontology arises as a new knowlege representation and orgnization method, which hasstrong semantic description and reasoning ability, and is considered as a useful tool to directthe information search process. Untill now, it has been used by many researchers to improvethe information search quality. After analysing and pointing out the deficiency of the current enterprise search techniques, this paper brings ontology into the enterprise searchoptimization research. Firstly, as ontology is the foundation of enterprise search optimizing,we develop a novel fuzzy information ontology modeling method, thus to make up theshortcomings of the current techniques in uncertain knowledge representation and reasoning;and then we analyze the role and effect of ontology in the document retrieval and databasequerying processes, and propose a new ontology-based query extension model, a documentrank model, as well as a relational database search result diversification method.(1) Fuzzy domain ontology modeling. After analyzing the feature of domain knowledge,we propose the requirements of fuzzy domain ontology modeling, and point out thedeficiency of current ontology in representing and reasoning fuzzy knowledge. Then wedevelop a new fuzzy ontology modeling method, and apply it to build the fuzzy geospatialontology model, FGSO, which makes fuzzy geospatial information conveniently shared andreused by different systems. Fistly, FGSO extends the ontology language, OWL, to describethe complex fuzzy semantic of spatial relations and construct the spatial relation ontology, anda set of fuzzy interfaces are built to support fuzzy spatial relation reasoning; secondly, arestricted fuzzy language, FR-OWL, is defined to build the fuzzy geospatial ontology thatdescribes all types of the fuzzy geospatial information besides fuzzy spatial relations in auniform format and supports fuzzy semantic reasoning; finally, a converting algorithm isdeveloped to help the above mentioned two ontologies work together. Based on FGSO, afuzzy geospatial semantic retrieval system is developed, which realizes part of functions anenterprise search engine has, and its search quality is also validated by a set of experiments.(2) Ontology based query extension. After analyzing how query semantic extensionaffects the enterprise document retrieval, a novel concept semantic similarity calculatingmodel is proposed, which can be applied to extend user's queries thus make them reflect user'sactural information needs more accurately. The model considers both concept restrictions andontology hierarchy for evaluating the concept similarity. Firstly, a restriction comparingalgorithm is developed to compare the difference between the semantic descriptions of twoconcepts in detail; and then the distance between two concepts within the ontology hierarchyis calculated to make up the insufficiency caused by restriction incompletion; and finally thesetwo evaluation results are combined to compute the concept similarity value. The experiment results show that our model performs better on getting accurate similarity value andimproving search quality than traditional models do.(3) Ontology based document ranking. Considering that the rich semantic knowledgecontained in domain ontology can direct the query-document understanding and matchingprocess well, a new document ranking model is proposed. The model not only mines conceptsfrom documents (query) and uses them to calculate the concept-based document relevance,but also mines semantic relations implied in document (query), based on which to constructdocument relation graph to imitate document content (to construct query relation graph toimitate user's query intension), and develops graph matching algorithm to calculate therelation-based document relevance. These two relevance value are finally combined to decidethe final document ranking list. A set of experiments are carried out on test datasets, and theresults show that the rank model can improve search quality and time efficiency effectively.(4) Diversification of keyword search over relational database. The traditional keywordsearch techniques on database use tuple relevance as the only criteria to rank, with tuplenovelty not considered, which make the top-n results of the rank list high relevant to the querybut similar to each other leading users to waste a lot time finding what they really need. Thispaper proposes an ontology-based search result diversification method to handle the problem.Firstly, a set of semantic graphs are extracted from the ontology to imitate all user's possiblequery intensions; secondly, these graphs are ranked based on how well they can reflect user'sinformation needs and the similarity between graphs; Finally, the query result list isconstructed with the help of the graph list. The experiment results on public test dataset showthat our method can minimize the risk of user's dissatisfaction.To sum up, this paper does research on several core problems in enterprise searchoptimization, and the achivements provide valuable reference for developing high usefulenterpeise search engine. However, there are still so many open problems in the domain,which need to be resolved.
Keywords/Search Tags:Ontology, Enterprise search, Fuzzy ontology language, Fuzzy semantic reasoning, Queryextension, Document relevance calculating, Search result diversification
PDF Full Text Request
Related items