Font Size: a A A

The Study And Implementation On The Key Problems Of Intelligent Search Engine Technology

Posted on:2012-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:W WangFull Text:PDF
GTID:2178330332994841Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology, the web information resource grows explosively, and the number of Internet users is increasing at an alarming rate. More and more Internet users become accustomed to retrieve online information through a search engine. And with the wide spread application of search engine, people no longer satisfy with the traditional search engine, and they hope that the search engine more intelligent, more human and more precise. These new demands give a higher requirement to the search engine technology.The present paper paid a close attention to the current research about the intelligent search engine technology which focuses on several key issues as follows:1) Based on the adjustment algorithm of website priority, the dynamic web page information gathering technology was proposed and realized. Through testing the average freshness of sampling pages, the website priority level was dynamically adjusted, resultantly, followed by the acquisition frequency for information of the corresponding website's pages.2) In this paper, the relationship of the Chinese character's density in the source code of the web page and the main text was studied; the extraction algorithm based on the Chinese character's density in the web page was proposed and realized; the dependence of previous page text extraction algorithm on web HTML tags was gotten rid of; with the help of rules, the quick and efficient extraction of web page text was implemented.3) Especially, several issues in the field of automatic classification of text were investigated. Vector dynamic dimension reduction technology based on Hash table was proposed and realized, along with improved vector cosine similarity algorithm. The unique indexing of concept subject for web document also was analyzed. The vector space model based on the keyword vector was constructed. The application of these algorithms in automatic categorization of text was discussed.4) The automatic categorization technology of text was studied for the Web page classification algorithm. The classification algorithm of Category center vector and CKNN (Clustering K nearest neighbors) was presented.5) The document structure model based on Vector space model was investigated with the application of the vector cosine similarity to text automatic extraction. Then the algorithm of text extraction based on similarity was raised. Finally, the fruit of the present study was employed to achieve a dynamic web page information acquisition system combining many functions, such as web page text extraction, web page automatic classification, web page subject extraction and webpage document summary algorithm. Therefore, this system was provided of real-time and adaptability.
Keywords/Search Tags:Intelligent search engine technology, Dynamic web page information acquisition system, Web page text extraction, Web page classification algorithm, Web Page Summary, Vector dynamic dimension reduction
PDF Full Text Request
Related items