Font Size: a A A

Development And Maintenance Of Full-text Retrieval Web System Based On Lucene

Posted on:2017-08-05Degree:MasterType:Thesis
Country:ChinaCandidate:W Y ChiFull Text:PDF
GTID:2348330518995592Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The result of search is not always satisfactory,because of the traditional full-text retrival technology in the stage of search only somply match the search words.In order to further improve the retrieval performance,this paper takes advantage of the powerful semantic ability of ontology to design and implementation the expansion of search word,improved the system retrieves performance.In order to achieve this goal,the main research content includes the following four points:(1)To propose an ontology semiautomatic construction method and practice it in the news domain,prove the feasibility of the system.The specific process is that first we find out the vocabulary of News domain based on data mining and text clustering,next,we consist of a binary concept of property relations by sememe of the HowNet concept,and then,we find out the words relationship after clustering by FCA concept lattice theory,finally,we use Jena to encode and format with owl ontology language obtained above,complete domain ontology automatic construction.(2)Design and implementation an improved BDMM algorithm for Chinese word segmentation.There are two difficulties for word segmentation,identification of unknown words and digestion of ambiguous words.The method proposed by this paper is that For unknown words,when we split word encountered a series single words,then we make these words is an unknown word in dictionary.For ambiguous words,to identify all the same word appears in the trained text,we make the higher proportion right result.Finally,we use the data supplied by ShanXi University to compare with BDMM on recall and precision rate,show the improved algorithm has better performance.(3)Based on the above two working contents,design and implementation full-text Retrieval web system based on ontology.In order to improve further the performance of retrieval system.Through term-concept mapping technology to solve the problem that search term must be controlled.Through the word association analysis of,local context to solve the problem that extended words are too much.At last,this paper design and implement the full-text retrieval system based Lucene by SHH framework,and through adding the modules of extend by ontology,word association analysis,term-concept mapping technology step by step,achieve incremental performance comparison,prove the system has better performance.(4)Do security maintenance for the web system achieved by this paper.Do safety reinforcing For some commen security hole of web system.Through respectively scanning this Web system before or after carry out the scheme of system security enhancement by the AppScan security testing tools,prove scheme of system security enhancement implemented by this paper is feasibility.
Keywords/Search Tags:ontology, full-text retrieval, Lucene, Chinese word segmentation, association analysis
PDF Full Text Request
Related items