Font Size: a A A

Research And Implementation Of Several Key Technologies In Intelligent Chinese Search Engine

Posted on:2007-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z M PanFull Text:PDF
GTID:2178360182493744Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, Search Engine has become a necessary tool for people who want to obtain information from Internet .At present, most of the traditional search engines are based on the matching of key words, which can't solve the problem of associations between web page and key word efficiently;As to the way of relevant term suggestion, traditional search engines make it by adding prefix or suffix to the searching key words. This approach doesn't reach the level of semantic understanding. It can't reflect the user's purpose well and needs improving. So how to understand the Chinese web information well and improve the relativity between keyword and web page, as well as provide a semantic way of relevant term suggestion have already become several key issues of novel intelligent Chinese Search Engine which need to be solved imperatively.We have a further study on intelligent Chinese Search Engine system, and implement several key technologies on it. The main innovations of thesis are as follows:1) Design a trie-like data structure for Chinese word segmentation dictionary. Propose a new approach for Chinese word segmentation. The process is subdivided into two phases. In the first phase, we complete ambiguity elimination and name entity identification task, using bigram model together with certain rules. In the second phase, we achieve the identification of new words, using Maximum Matching Method. Our experimental result shows that the scheme improves the performance and precision of the dictionary greatly, which provides premise of high quality concept word generating and following web processing.2) Put forward a refresh way of relevant term suggestion. It is based on thinking of concept cluster, which is different from the traditional search engines. By using the result of concept cluster and the key words' weight to web pages, we present a new rank algorithm for web pages, which we call CCR (Concept Cluster Rank). Our experimental result shows that it reflects the relativity between keyword and web page well and gets the retrieval result on semantic level, expanding the depth and width of retrieval as well. In the mean time we optimize user query module by using synonym word lists, which achieves synonymous tips function.3) Design the framework of the intelligent Chinese Search Engine system, offering a possible implemental scheme, and also propose some revised algorithm for PageRank compute, concept cluster generation and index construction in the massive data environment. Finally we implement a prototype system on the large practical server clusters and give the detailed experimental result.
Keywords/Search Tags:Chinese Search Engine, Chinese Word Segmentation, Trie Tree, Concept Cluster, Page Rank
PDF Full Text Request
Related items