Font Size: a A A

Domain-specific Search Engine Based On TSE

Posted on:2009-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2178360272476421Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Search engine can collect the information from internet.Then you can query from it not from all over the internet.It consists of information collection,information making,user-querying,three parts.It is a web that can provide you with information-querying and services.It uses some program to classify all the information from internet,and helps you to find your need.With the running up of Web information, Search Engine as a kind of new technology has been developed gradually. As we know, there is a huge difficulty for users to look for information in the vast "data ocean".So Search Engine is the perfect technology appearing in perfect time to solve the problem. It finds and collects the information by a certain device, then comprehends, abstracts, organizes and handles these information. So it acting likes an information navigator and takes more and more important responsibilities in netizens' daily life.Search engine has become the second core technology of internet,the first one is the window-doorweb.Information-querying,artificial-intelligence,computer-web,distributed-system,database,data-dig,data-lib, natural-language and so on are used,theory and technology of so many fields.so it has synthesis and challenge.with the developing of internet and information increasing faster and faster,it becomes more inportant to people.Therefore, Search Engine technology has became a new studying and developing target in the industrial & academic field of computer science.The Domain-specific Search Engine also be named vertical search engine,its target is some field.it is the branch of search engine,it is a integration lib of some fields information.By the designed direction,it collects the data,then make it and give it to the users in some suit.the vertical engine works for some special fields,for some special people,some special requirement.it can simplely be seen as the child web of the search engine,its success turns out that the structure of internet must be included of many sides.the common search engine can't service for the special field well.the requirement of the people control the search engine,and make it fined. Special service given to the special requirement.We can say that common search engine provide the market for the vertical search engine.That is the direction of the search engine development.The great difference between vertical search engine and common search engine is that vertical search engine collects the information of webpage by the structure. It save the information by its defined-sturcture. Just web search looks the webpage block as the smallest unit. but vertical search engine looks the orderliness data as the smallest unit. Then it saves the data to the database. then it analyse the data and take out the simple word. it services for the requirement of the customers.The Domain-specific Search Engine has been an important research branch of information retrieval and achieved rapid development in recent years. However, there are still some issues need to be studied further for boosting its practical application and improving its effectiveness and efficiency. It is uncommon of History Search Engine in specific field in China present so this dissertation has a sort of value in Vertical Search Engine.TSE (Tiny Search Engine) is a tiny Search Engine model base on the Tianwang Search Engine which runs under the Linux operating system. It is an open-source software, developed by Dr. Yan Hongfei of the network lab of Peking University. The aim of TSE is providing a studying atmosphere for the people who want to study the Search Engine. But the TSE is a general Search Engine, so it is not doing well in some domain-specific search. It showed the deficiency in effectiveness and accuracy.This paper will add the specific field - History to the TSE system and adjust its Chinese segmentation and indexing. It includes: (1) Add specific History Wordlist to the original TSE Wordlist. (2) Improve the original TSE segmentation algorithm. (3) Change the related parameters of Webpage Crawling to make the TSE more suitable for Domain-specific Search.For the advantage in specific knowledge owes by the Ideal Technology Institute of certain University, there will be 200 History files supplied by the institute as sample for testing. Comparing the results between the changing one and the original one to conclude a result-file and give the further studying suggestion.
Keywords/Search Tags:Domain-specific Search Engine, Segmentation, Indexing
PDF Full Text Request
Related items