Font Size: a A A

Design And Realization Of The Search Engine System For Campus Network

Posted on:2011-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:H ChenFull Text:PDF
GTID:2178360308997463Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Currently, in the situation that the state takes great efforts to develop vocational education, the higher vocational education develops rapidly and the scale of running schools has also been expanding. With the management system of higher vocational institutions being sound and the deepening of reform, when considering the improvement of management level, teaching quality and efficiency, the necessity of the construction of digital campus becomes extremely important. It is the only way to develop the information management of higher vocational education. Therefore, all the schools speed up the construction of information technology. When the content of campus network has been continuously enriched, to the great extent, the working efficiency and teaching quality have also been improved. However, with the development of campus network hardware and software equipment, the school campus network, for the purpose of sharing school information resources, shows explosive growth. However, the information that each user's really interested in is very limited, which is just like a drop in the bucket. It becomes increasingly difficult for network users, both within and outside the campus network, to find valuable information in the vast ocean of campus network. Therefore, the search engine for the campus network system came into being.Nowadays, with high-speed expansion of campus network, search engine has become very important navigation tool for campus network. Considering that all of the campus networks are based on the school website which are under the domain name link address of the extended address, search engine web crawler needs to complete the collection of information resources of campus network work when operating environment for network debugging reptiles, crawling the definition and types of seed configuration are completed at the same time. we need to develop a reasonable Web crawler crawling period according to the updating speed of campus network resources, so that we can continue to discover and collect new sites within the campus network and the web; use the API offered by Java library HTMLParser to complete the extraction of specific text information on web pages of campus network collected by the Web crawler. On this basis, we can use open source Lucene engine architecture and the Chinese word components JE-analysis to make a further understanding, extraction, organization and treatment on this text. The final index file may provide users with information retrieval services in order to achieve the purpose of navigation. As the search engine system is developed for the specific needs of the campus network, so it can meet the needs of users to search for information on the campus network in a better way.The appearance of the campus network search engines makes up the disadvantage that in the use of general search of engines search process, the results cover widely and there is duplication and spam message, which provides users with a more accurate personalized service.With the analysis of current situation on the campus network, this thesis was completed for the following work:(1) On the basis of in-depth understanding of operating principle of search engine, define the specific implementation of this system that need to be used in all open source components and related technology.(2) Based on the real needs of campus network of a certain college in Guangdong province, complete a documentation of needs analysis.(3) Complete the design work for the general structure of system on the basis of design objectives and principles of design, and define the running processes of system.(4) Customize and extend Heritrix Web crawler to achieve the resources within the campus network crawling work. (5) Design and use API to extract and process the information collected by the Web crawler.(6) Based on the in-depth study of Lucene and related technologies, the work of modification and extension should be continued until these can be used in this system, and ultimately the searching work and retrieval services for campus network for can be realized.(7) Complete the design of entry page of the system design and testing examples.
Keywords/Search Tags:search engine, retrieval services, web crawler, Lucene
PDF Full Text Request
Related items