Font Size: a A A

Design And Implementation Of Search Engine For Electronic Commerce In Biology Field

Posted on:2018-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:S Z SongFull Text:PDF
GTID:2348330566955073Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularization and development of internet,E-commerce has become a new business model,the electronic commerce website have thousands of goods,if the electronic business company does not have a search engine,we will find that it is difficult to search target goods.Therefore,it is necessary for the electronic commerce website to build its own search engine.There are three ways to build a search engine,First is that using generic search engines to help search such as Google and Baidu,the effect is not ideal and not flexible.Second is that the use of existing open source framework to realize the search,this method is vulnerable to the restriction of the framework and is not flexible enough;Third is that rely on existing technology build your own search engine,this scheme is flexible and efficient,it can solve the problem of search commodity very well in the electricity commerce website,which is the method adopted by most e-commerce websites now.This is also the scheme adopted in this thesis.The purpose is to provide users with accurate,comprehensive and rapid commodity inquiry service.The main works are as follows:(1)Design and implementation of data acquisition and establishment of commodity web library module.According to the characteristics of commodities in biological field,a suitable web page storage format is designed,and then the text web library of biological products is build.We use top K algorithm to deal with the duplicated pages in the web pages.This thesis studies the NLPIR segmentation technology,and uses this technology to segment all the pages in the web page library.so as to generate the commodity lexicon.(2)Design and implementation of inverted index for web pages,All the web pages in the server assigned a unique ID,and then use the TF-IDF algorithm to calculate the weight of words in the web page,finally completed the establishment of inverted index.(3)Realization of keywords correcting function module.In order to improve the correction efficiency,the index module is added,index technology can reduce search range,Then the shortest edit distance algorithm is used to correct the incorrect words entered by the user.(4)Design and implementation of query module.The algorithm for computing the similarity of web pages(cosine similarity theorem)in the search engine is deeply studied.The algorithm is used to measure the similarity between two pages.And according to the results of the calculation,the web pages are sorted,available to users.Finally,the functions of the search engine are tested and analyzed.The test results show that the search engine system runs well and its performance meets our expectations,this search engine solved the practical problems of the enterprise and has some practical value.
Keywords/Search Tags:search engine, inverted index, TF-IDF, cosine similarity theorem
PDF Full Text Request
Related items