Font Size: a A A

Semantic Indexing Based On Ontology

Posted on:2016-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:S S HouFull Text:PDF
GTID:2308330461975718Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of the internet data, users can use the search engine to obtain information, but it’s more and more difficult to obtain accurate data they want. The method of obtaining data by search engine is that search engine crawls web data from the internet and index it, then returns the data matched the index and the query the user inputted. Index plays an important role. The index is based on terms, lack of semantics. Because of polysemy and multi-word a meaning, etc., the result’s precision and recall rates dissatisfy the users. Many scholars have proposed solutions, such as implicit semantic indexing, text indexing and so on. However, these methods are fragmented, lacking whole idea. So the author proposes a new method:re-index the web documents by classifying the documents according the relevance of concept and the document, and store the index according to the storage mechanism proposed by the thesis. Below is the details:First, build the map of term and document list (termâ†'doc list) by traditional inverted index. Then get the term’s concept list by term-entity table, get eigenvector matrix of the concept list and eigenvector of every web document by Vector Space Model (VSM). Reduce the dimension of the web document eigenvector and build the web document list eigenvector matrix (WDLEM). Then compute the correlation of concept list eigenvector matrix (CLEM) and WDLEM, get the relevance of each document and concept. Classify the document under the concept which is most relevant with the document and get the concept index entry (conceptâ†'document list). Finally, merge the index by the same concept and get a complete semantic index. The method is logically solved the problem of semantic index.The next problem is the semantic indexing of physical storage management issues. If it is stored sequentially, the query efficiency is intolerable, so it is necessary to design a well-organized storage structure, accelerating its search efficiency. To solve this problem, we design semantic indexing data structures--semantic indexing tree. It is built by combining with the ontology concept tree built according to the concept’s "is-a" relationship and concept index file. It consists of three parts:ontology concept tree, instance-index table, semantic inverted index files. Semantic inverted index file was divided into many small files according to the concept. The small files can be quickly located by searching the tree.The main contributions of the author are:1. Methods of semantic indexing is proposed and designed, and the corresponding algorithm is given. Experiments show that the accuracy rate has been greatly improved.2. The storage structure of semantic indexing has designed, experiments show that the average high query efficiency, compared to other indexing efficiency has a great advantage.
Keywords/Search Tags:Semantic indexing, Ontology coneept tree, Ontology, Index classification, VSM
PDF Full Text Request
Related items