Font Size: a A A

The Research And Implementation On The Medical Site In-site Search Engine Basing On Double Tokenizers

Posted on:2015-12-20Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y YaoFull Text:PDF
GTID:2298330431492684Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The in-site search technology is a very important branch of the universal searchengine technologies, an in-site search engine, which is used only for the medicalinformation service website, is a necessity for users, not only doctors but also patients,to find and identify the needed information quickly and conveniently. At the sametime, an in-site search engine, using in the medical information website, shouldcharacterize as a higher performance than the universal search engine, and just for thetwo mentioned questions, designing and implementing an in-site search engine withinthe medical information website should be evaluated as having research value andalso practical value.At the beginning of this paper, we introduce the background and meaning of thisthesis, and next, some related knowledge and several different means which can beused to build an in-site search engine is also covered, and through comparison, wechoose the universal search technologies to solve in-site search problems, and then,just because of the chosen method, we make a fully introduction of the universalsearch technologies and the key technologies about it and further some open sourceways can be adopted for building a universal search engine. Basing on the aboveintroduction and a clear comparison, this paper select the open source search enginearchitecture Nutch as the main source which can be secondly developed to built ourin-site search engine. Using all the above related research and analysis, this paper ismainly concentrated on two parts.Firstly, according to the basic feature of the medical information service websiteand the specific user habits, this article proposes a new different architecture of in-sitesearch engine model which is based on double tokenizers. Aiming at solvingproblems such as fuzzy searching, Pinyin searching as well as result ranking whichare always concerned when talking about a high efficiency in-site search engine, thisnew model, customizing and redeveloping the Nutch, an open-source search engine architecture, firstly, is embedded a second tokenizer with a special dictionary. Thisspecial tokenizer can process almost all the user input.Secondly, the paper make a detailed contrast experiment between the in-sitesearch engine using double tokenizers and the in-site search engine using only onetokenizer. The content of the experiment includes the performance comparison of thetwo in-site search models using performance indexes such as the search time, thesearch precision rate, and the ranking reasonable index. This paper also analyzes thebehavior of the two in-site search models using several long search words consistingof different types of search keyword. Through the experiment, we can have a clearimpression that the in-site search model with double tokenizers has a evidentadvantage over the in-site search model. Also, the experiment covers this test thatwhether the double tokenizers search engine can process search words using pinyin orsearch words which are actually fraud causing by the pinyin correctly, finally, theexperiment prove that the above two types of search words can be processedprecisely.
Keywords/Search Tags:medical information web-site, in-site searching, double tokenizerarchitecture, pinyin searching, search result ranking
PDF Full Text Request
Related items