Font Size: a A A

The Research On Structured Information Processing Technology Based On Vertical Search Engines

Posted on:2014-01-02Degree:MasterType:Thesis
Country:ChinaCandidate:H D FanFull Text:PDF
GTID:2248330398495270Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet, search engines continue to meet the needsof the huge amount of information resources, but can’t take into account accuracy andtimeliness of the information search, so vertical search engine came into being to meetusers’ needs. Based on the vertical search engine in-depth study,this paper proposedan improved classification and a new duplication-remove algorithm for vertical searchengine model to prove that the new model can further the real-time and accuracy ofvertical search engines to solve the problems of the existing models. This paper usedthe strategy of adding secondary a data processing module into the common model tocomplete the information manager, The chief function of new module is extracted tounstructured data and semi-structured data to structured data conversion. The maincontents of the modules are re-processing and classification of web informationprocessing. Therefore the main research content and innovation is divided into thefollowing three points.(1) Based on the reference to the existing field of electronic commerce usedVertical search engine widely, this paper proposed an improved vertical search enginemodel with the practicality and feasibility that proved by the recall rate and theaccuracy rate of the two indicators.(2) Propose a new algorithm for processing the repeated web information, usedthe time complexity, space complexity, recall and accuracy as the four indicators toanalyze the feasibility and robust of algorithm for improved vertical search enginemodel, and the improvement of the efficiency of information retrieval.(3) Adopt a new classification algorithm, the structure of algorithm is includinga linked list of entries array and each link of texts. The word refers to all the trainingtext entry array after all the characteristics of feature extraction, stored in an array isthe ID number of the characteristics of entries items. Each entry in the Term array has a pointer to contain all the linked list consisted by text. Text linked list contains twoparts, such as the weight of the ID and ti in the text. After the text linked list of ti isgenerated, the right heavy descending order according to the text, then find the scopeof the original algorithm, thereby reducing its further optimization.
Keywords/Search Tags:Search, Index, Structured, Information Processing, Algorithms
PDF Full Text Request
Related items