Font Size: a A A

Focused Crawler Algorithm Formal Concept Analysis

Posted on:2014-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:X K WangFull Text:PDF
GTID:2268330401458316Subject:Probability theory and mathematical statistics
Abstract/Summary:PDF Full Text Request
The rapid growth of mobile Internet makes search engine facing the enormous challenge. How the search engine adapts to this change and provides better retrieval service has become a major concern. As an important part of it, the web crawler algorithm is becoming the research focus. General web crawler cannot satisfy the users for specific information and interest topic crawl, due to the large scale and the disorderly content of the web page. Topic web crawler can selectively crawl the web page relevant to the theme, effectively reducing the number of pages in the crawl and improving the accuracy, so that it meets the users’demands for the topic-oriented search.Formal concept analysis (FCA) is a data analytical method based on concept lattice. Because of the intuitive and concise representation features of knowledge, FCA has attracted a great attention of researchers since it had been put forward. Now it has been widely used in a wide range of areas such as software engineering, library, information science, data mining etc.This paper is based on the principle of current topic crawler, advancing that apply the formal concept analysis, a data analysis tool to the relevant topic crawler algorithm, apply the concept lattice to the theme correlation analysis, thus improving the calculation method of theme related degree. The main research work of this paper include: Firstly, this paper studies the FCA theory, focusing on the core part-concept lattice, especially the relationship between the concepts on concept lattice and the structure of concept lattice, then associates the concept lattice into the theme crawler algorithm.Secondly, this paper studies the principle of theme crawler, including its structure, search strategy, Pagerank scheduling algorithm and theme related degree, then improves the calculation method of theme related degree on the base of concept lattice. Continually, this paper analyses the defects of Pagerank scheduling algorithm. With the combination of concept lattice, this paper finally proposes an improved Pagerank algorithm combining.
Keywords/Search Tags:Formal concept analysis, Concept lattice, theme relateddegree, Pagerank
PDF Full Text Request
Related items