Font Size: a A A

Full Text Search Engine Realizes Data Information Collection

Posted on:2019-07-24Degree:MasterType:Thesis
Country:ChinaCandidate:K GuoFull Text:PDF
GTID:2348330542457688Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid growth of Internet data,conventional search engines such as full-text search engines return large amounts of information including irrelevant information when people search information,making it impossible for users to accurately obtain the information they really want.In order to solve this problem,vertical search engines came into being.Vertical search engine is the refinement and extension of full-text search engine.It is a professional full-text search engine that can help people obtain professional data.Currently,the vertical search engine often takes a lot of time to create an index for a large amount of data captured from the network,wastes system resources,and the index library cannot be updated in time.This thesis focuses on the use of Baidu map API to grab a large number of map point data and improve the comparison algorithm of the IK word segmentation tokens in the sorting set.Due to the Baidu map's strategy of limiting the speed,limiting the amount of information,limiting the number of visits to the user's access to the map data,people can only obtain a small amount of information during a limited number of visits.This thesis narrows the grabbing range by means of rectangle cutting and splicing,increases the access to map point data by increasing the number of visits,and then through loop traversal,obtains a large number of map point data information through simulation;there are many When the word segmentation method is easy to ambiguous sentences,the IK word segmentation device always prefers a word segmentation method with a small number of word elements after the word segmentation,while ignoring a small number of word elements that are more secondary,and a relatively large number of word elements are more important.Situation.Therefore,by continuously judging the weights of the two lexical words to enhance the judgement of the ambiguous sentences,the improved algorithm can indeed help the search engine to improve the efficiency of creating the index,thereby reducing the time for the system to create the index,and indirectly helping the system and the user.Update the index library faster and provide more detailed services.Based on the analysis of the principle and workflow of the full-text search engine,this thesis simulates the operation of the traditional web crawler,the topic web crawler,and the API crawling tool,and obtains the data they grab from the network through simulation and simulation.The advantages and disadvantages of the three crawl modes.This thesis analyzes the working principle of Lucene based on the creation of a large number of text data indexing,through the Lucene scoring mechanism to get the input keyword in the size of each article it appears.When the simulation simulates the user's input of a keyword,the search engine creates an index of the input keyword and makes a query.Finally,the results of the query are output from the output to the output interface of the simulation software console according to the score of the document.
Keywords/Search Tags:Full text search engine, Internet worm, Lucene, IK tokenizer
PDF Full Text Request
Related items