Font Size: a A A

Research And Implement On Key Technologies Of Panoramic Search Engine

Posted on:2011-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:M B ChengFull Text:PDF
GTID:2178330338479977Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the technique of Web 2.0 progresses, web services have changed from the releasing-based style into the interacting-based style. Services like wiki, blog, forum, community-QA, picture and video sharing services have emerged as the representative services on the web. With the growth of these new style of services which stand for the concept of free and open, tag (bookmark) which is considered as an open organization style has been widely utilized by those new style services to organize their resources. Now, these services have allocated a huge number of tagged resources, and these resources have become to a great treasure for other applications.Based on these interacting-based services, we proposed a brand new concept of search in this dissertation: Panoramic Search Engine (PSE for short). For a query from a certain user, the PSE could re-arrange the searching results not only based on their relevance score to the query, but also on their styles and literary form (like wiki, blog, forum, news, picture and video), and the user could receive a magazine styled result page, which contains the information from every aspect with more elegant performance and more detailed information. In this dissertation, we discussed our implementation of PSE named as FOXINFO, which has further implemented the topic search techniques by utilizing tags of its crawled resources. Further, we discussed some key techniques of FOXINFO, and also the technique of relevant topic mining is discussed in detail. The major content of our work includes:1) We presented the architecture of the PSE system, and held some discussion on some key techniques of the PSE system. a) For data collecting, the crawling techniques have been compared and the implementation of the wrapper for news is presented; b) For data indexing, a discussion on how to effectively manage resources of multi-styles is held, and the solution is proposed; c) For online service, a distributed online searching architecture is presented, the communication principle and the working flow of each module are also presented.2) Research on relevant topic (tag) mining has been discussed. Methods including document co-occurrence ratio (DCR), frequent of document co- occurrence (FDC) and vector space model (VSM) are compared on the testing dataset which is allocated from Baidu Zhishi. The experimental result indicates that the DCR method achieves the best performance.3) Based on the researches of relevant tag mining, we further discussed the technique of constructing hierarchy tree from a set of tags. By taking the DCR and VSM as the relevance measure, an algorithm named as Tag Cohesion Algorithm for tag hierarchy tree constructing is proposed. Furthermore, by modifying the traditional clustering algorithm, another Clustering-based algorithm for tag hierarchy tree constructing is also proposed. Experiments show that both algorithms perform well on the dataset of Baidu Zhishi.
Keywords/Search Tags:Panoramic Search Engine, Tag Relevance Measure, Hierarchy Tag Tree
PDF Full Text Request
Related items