Font Size: a A A

Study On The Key Technologies Of Vertical Search Engine In Components Area

Posted on:2012-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:X H SuFull Text:PDF
GTID:2218330362956510Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Component-based software development method is considered an effective way to solve the software crisis. The component library is the infrastructure of this method. However, the size of a single component library can not meet the needs of software developers, many heterogeneous interaction between the component library is not accessible, the "component seeking" becomes very difficult.With the rapid development of Internet, som component libraries providing commercial components and open source components has appeared on the Internet,in addition, internet is also littered with a lot of components failed to be collected by a component library.The emergence and development of vertical search engines provides a solution and technical assurance of searching components on Internet. The vertical search engine is for a topic specific area and some specific populations, and it can improve the accuracy of retrieval and provide personal search service by mining analysis of the collected data deeply.The key technology of vertical search engine includes focus crawling algorithm,description of component and structured information extraction,component index and retrieval et al.The research of focus crawling algorithm launches on the foundation of Shark Search algorithm, and a new focus crawling algorithm L-Shark Search is designed with the OPIC algorithm implementation methods in Nutch. The experiments show that L-Shark Search algorithm has a better crawling effect than the original Shark Search algorithm. A component description model name iUCDL is designed by the informations that component library on Internet can provide, a template-based structured information extraction method for accurate component information extraction is also proposed. By adding structure information of XML,component facets search is supported while maintaining the style of full-text retrieval in Lucene. And it adds faceted component matching model to improve the quality of component information retrieval by second sortting to the results.In the end, experiments are designed to verify the contents of this research.
Keywords/Search Tags:component description, vertical search, focus crawling, information extraction, index improving
PDF Full Text Request
Related items