The Research On Structured Information Processing Technology Based On Vertical Search Engines

Posted on:2014-01-02

Degree:Master

Type:Thesis

Country:China

Candidate:H D Fan

Full Text:PDF

GTID:2248330398495270

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the development of the Internet, search engines continue to meet the needsof the huge amount of information resources, but can’t take into account accuracy andtimeliness of the information search, so vertical search engine came into being to meetusers’ needs. Based on the vertical search engine in-depth study,this paper proposedan improved classification and a new duplication-remove algorithm for vertical searchengine model to prove that the new model can further the real-time and accuracy ofvertical search engines to solve the problems of the existing models. This paper usedthe strategy of adding secondary a data processing module into the common model tocomplete the information manager, The chief function of new module is extracted tounstructured data and semi-structured data to structured data conversion. The maincontents of the modules are re-processing and classification of web informationprocessing. Therefore the main research content and innovation is divided into thefollowing three points.(1) Based on the reference to the existing field of electronic commerce usedVertical search engine widely, this paper proposed an improved vertical search enginemodel with the practicality and feasibility that proved by the recall rate and theaccuracy rate of the two indicators.(2) Propose a new algorithm for processing the repeated web information, usedthe time complexity, space complexity, recall and accuracy as the four indicators toanalyze the feasibility and robust of algorithm for improved vertical search enginemodel, and the improvement of the efficiency of information retrieval.(3) Adopt a new classification algorithm, the structure of algorithm is includinga linked list of entries array and each link of texts. The word refers to all the trainingtext entry array after all the characteristics of feature extraction, stored in an array isthe ID number of the characteristics of entries items. Each entry in the Term array has a pointer to contain all the linked list consisted by text. Text linked list contains twoparts, such as the weight of the ID and ti in the text. After the text linked list of ti isgenerated, the right heavy descending order according to the text, then find the scopeof the original algorithm, thereby reducing its further optimization.

Keywords/Search Tags:

Search, Index, Structured, Information Processing, Algorithms

PDF Full Text Request

Related items

1	Optimal Search Algorithms for Structured Problems in Natural Language Processing
2	Efficient Algorithms for Search Engine Query Processing
3	Index Compression And Query Processing In Search Engines
4	Index Compression and Efficient Query Processing in Large Web Search Engines
5	Ranked search over structured and semi-structured data
6	Query Processing In Structured Peer-to-Peer Networks
7	Development Of Heterogeneous Data Structured Processing Management Software
8	Research On The Index Technology Of Semi-structured Data
9	Efficient TopK Processing In Web Search Systems
10	Study On Query Processing Techniques In XML Search