Font Size: a A A

General In The English Professional Search Engine Technology And Applications

Posted on:2005-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:F LiuFull Text:PDF
GTID:2208360122997071Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the gradual popularization and development of Internet, the information resource of Internet is increasing as geometric series. It brings us a great plenty of information and at the same time it also brings up an important research task how to retrieve useful information from tremendous amount of information resource effectively and accurately. Thus, web search engine comes into being as the times requires. Recently, the general search engine can't satisfy our requirements of getting professional information. The minitype and professional search engine is the trend of development and has wide application prospect.The paper introduces the basic structure and principle of general search engine and analyses the key technology, working principle, realization method and design fundamental of every composing part in search engine. It lays a strong emphasis on discussing web robot technique, Chinese segmentation technique, vector space model technique, text automatic categorization technique, web information index technique and web information retrieval technique. On the basis of all above techniques, the paper makes some deep research on the realization method of all key technologies. In realization, the paper adopts multi-threads technique, feature extraction and adding weight technique, similarity ranking technique. These techniques are effective in increasing the efficiency and quantity of collection, classification and retrieval of web information.On the basis of general search engine techniques and according to the speciality of professional searching characteristics, the paper designs on a Chinese-English professional search engine. It mainly uses the specialization method of general search engine, which limits the searching range and filters professional information by auto-classification. At the same time, in order to make the design more generalized, the paper takes the general design method, based on which all kinds of professional search engine can be constructed easily. In order to enhance the efficiency and quality, the professional search engine uses some key techniques, such as dynamic revising the words database by analyzing the log of retrieval, dynamic extending the training documents set by adding the classified professional documents. Compared with the conventional technology of Chinese segmentation and index, the paper uses a more simple and effective method respectively. They are Chinese segmentation based on view of database and bidirectional index method based on table of database.According to the paper's design, a general professional search engine is realized, which uses Java as programming language and uses Oracle8i as DBMS. By sufficient test, the current Chinese-English' professional search engine has applied to the research of Chinese human brain project and neuroinformatics, which is one of 973 preliminary research projects of our national ministry of science and technology.
Keywords/Search Tags:Search Engine, Robot, Automatic Categorization, VSM, Feature Extraction
PDF Full Text Request
Related items