Font Size: a A A

Design Of Vertical Search Engine For Academic Resources Of Computer Science

Posted on:2020-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiFull Text:PDF
GTID:2428330578977236Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the development of the Internet and the exponential growth of data volume,various fields are full of a lot of information.Increasing the reliability and professionalism of network retrieval has become an important task in various fields.Therefore,this paper develops a search engine platform for computer science resources based on vertical search engine technology as the research purpose.Firstly,this paper analyses the current research status of major search engines from the aspects of user needs,crawler structure and word segmentation index,and puts forward new requirements according to the requirements of this design.It studies and describes the functions and principles of data acquisition(web crawler),data processing,indexing and searcher,which are the core components of search engines.In view of the knowledge in computer field,we optimize its search function and optimize the following technologies in designing search engine:first,we optimize the crawler algorithm,introduce the distinguishing mechanism of the crawler to the URL,reduce the crawling times of the crawler,and improve the efficiency of the search engine;second,we aim at the resources of computer science.The text classification and word segmentation methods are optimized to make the search engine judge the information in this field more accurately.Thirdly,the user friendliness of the system is optimized according to the current research on the search engine results page.The main work of this paper is divided into the following aspects:(1)Designing crawler program and crawler strategy to obtain structured data of computer science resources.Firstly,the structure tree of web page code is constructed to realize web page partition.According to XPath of web page element information,the document object is found and structured data is obtained.(2)Faced with data duplication and data damage,Jaccard algorithm is introduced into the field of search engine,and a two-step encoding method is proposed for data preprocessing.Jaccard's idea is to take the ratio of the intersection and union of two sets as the similarity of two sets.Combining with the above content,this paper proposes a method of duplicate information filtering,which filters structured data information.(3)The influence of the distribution of search results page elements on search experience is studied.On the one hand,new page elements are embedded to make the search results page vertically and diversified;on the other hand,combined with user's search behavior data,including eyeball,cursor,gesture,acoustics and other data,users'intentions can be predicted.On the basis of the above work,the retrieval function of the vertical search engine and the web crawler are tested to ensure the accuracy of the system data.
Keywords/Search Tags:Vertical search, text categorization, crawler, Chinese word segmentation
PDF Full Text Request
Related items