Font Size: a A A

Research On Optimization Strategy Of Chinese Professional Search Engine

Posted on:2007-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:H X LinFull Text:PDF
GTID:2178360212995481Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the web information suddenly growth, it becomes more and more difficult that the users obtain the information that themselves wants to get. The users often obtain the inaccurate and outdated information.So that it needs to do further study to the search engine. Chinese professional search engine is the important branch to the search engine development, and it has the prominent superiority which the general search engine does not have. So massive study had been done on the Chinese professional search engine, mainly contains the following aspects:(1)Researching on the search strategy of topic web spider.The search strate- gy of topic web spider is the core technology to chinese professional search engine, its search strategy quality restricts the entire search engine performance seriously.Through Studying it was found that the Best-First algorithm is very suit to the search in specialized information, and its performance is the most superior among several searche algorithm.But itself also has the flaw, this algorithm is very greedy.It only can search the optimal results in the partial scope, and it is unable to obtain the optimal results in the overall scope. Therefore takeing the Best-First algorithm as the foundation, the BF-BF algorithm is proposed.BF-BF algorithm can solve the insufficiency of Best-First algorithm,and find the optimal solution in the overall scope.(2)Researching on the classification and index of web documents.The classification of web documents ofen using the VSM.That is to say a web document can be expressed a characteristic vector form.But it is very difficult that the characteristic items in the characteristic vector are absolute,and the dimension of vector is often very high.That increases the computation load,but it has no value.In view of this kind situation, normalization processing concept onthe initial characteristic vector of document was proposed.Though processing the initial characteristic vector of document not only reduced the vector dimension, but also maintained the independence of characteristic items.(3)Researching the optimized question on retrieval module.The retrieval module is the direct interactive part that the users use. Its optimized question relates to the popularity of search engine.For the sake of improving the performance of,the system knowledge database and the user information database was introduced into to retrieval module, so as to they can direct the retrieval process.So it can strengthen the precision of the users'retrieval greatly.
Keywords/Search Tags:Topic Web Spider, BF-BF Algorithm, Chinese Professional Search Engine, Retrieval, VSM
PDF Full Text Request
Related items