Font Size: a A A

Network Resources Searching Study Based On Content's Directory Tree

Posted on:2011-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y K WangFull Text:PDF
GTID:2178360308464795Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, information publishing and sharing beyond time and space limitations, it brought people enter an era of "information explosion". The rapid expansion of network information brought people not only a lot of network information, but also difficulties in finding information. If it have not a powerful tool to help people find and discover useful network information, people will be buried in the ocean of network information. Search engine is the birth of technology to search useful network information. It has a strategy to gather on the Internet and found network information and understanding of the information,extract,organize and process network information. It is the best bond between the users and network information. Search engine technology related to natural language understanding,Chinese word segmentation,artificial intelligence,machine learning and so on.This paper firstly elaborates the research background and significance of the directory tree search, and then analyze the current studies of the directory tree search.The paper inlucde web crawler that collect web pages,Chinese word segmentation,hierarchical classification and the establishment of directory tree and other pre-text of the collected for processing. The paper would analyze every process of the data mining on the theoretical aspect, there is the basic principles and strategies of web crawler in the web crawler; there is the principles of algorithm design and implementation of the Chinese word segmentation in the Chinese word segmentation; there is the basic principles,achieved algorithm and the character of hierarchical classification and created the directory tree, which contains the calculation in correlation of the text and application of Huffman; After pre-treatment on the text, we created a resource library, and then designed and implemented a search engine, and described the search steps from input information to output information.This paper not only explained the Web crawler,Chinese word segmentation,hierarchical classification and the establishment of the directory tree,search engine on the theoretical aspects, but also put one of the principle and algorithm into the experiment. Using Chinese word segmentation system to complete more than 14 million Chinese words; after the work has completed, some of the extracted text is hierarchical classification and the establishment of directory tree and then put it into the database; finally, design a search engine to search information on the established directory tree.Finally, this paper designed a number of functional test cases to test the entire function; the main work is to test some boundaries of the function. Tested to ensure system availability and stability, effectively ensured that the possibility of transforming theory into reality.
Keywords/Search Tags:Crawler, Chinese Word Segmentation, Hierarchaical classification, Directory Tree, Search engine
PDF Full Text Request
Related items