Extracting Local Web Communities Using Lexical Similarity

Posted on:2011-05-09

Degree:Master

Type:Thesis

Country:China

Candidate:W Xu

Full Text:PDF

GTID:2178330332961409

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid growth of Web information, it has become more and more important and challenging research problem that how to retrieve latent and useful information adequately among giant amount of Web information and utilize Web information efficiently in information field. It is very valuable to search Web commuity discovery in practice and academic study. The task of Web community extraction is to find all the cohesive Web pages given a specific query. It will redound to enhance the performance and precision of Web information retrieval and implement Web information clustering in some ways when Web community extraction algorithm is used to search engines.Based on the analysis of current Web and its data character, Web information retrieval model and the architecture of search engine, the classical Web community discovery algorithms are studied attentively most of which focus on link analysis without considering the textual property of Web pages. This paper proposes an improved algorithm based on Flake's method using the maximum flow algorithm.The improved algorithm considers the differences between edges in terms of importance, and assigns awell-designed capacity to each edge via the lexical similarity of Web pages.Given a specific query, it also lends itself to a new and efficient ranking scheme for members in the extracted community which strenghthens the differnence between members via their content similarity to seeds.We also propose an aggregation algorithm which constructs a vicinity graph on the granity of sites rather than pages according to the user's need. The experimental results indicate that our approach efficiently handles a variety of data sets on avoiding topic drifting and increasing both size and the quality of the extracted community.

Keywords/Search Tags:

Information Retrieval, Community Extraction, Maximum Flow Algorithm, Lexical Similarity

PDF Full Text Request

Related items

1	Lexical semantic similarity and its application to business catalog retrieval
2	Research And Realization On Web Community Mining Algorithm
3	Research And Application On Web Community Mining Technology Based On Net Flow
4	Design And Implementation Of Information Retrieval System Specific Domain
5	Chinese Informal Lexical Normalization Based On O&A Community
6	Research And Improvement For Web Community Identification Technique
7	Chinese Word Semantic Similarity Measure And Its Application In Cross-language Information Retrieval
8	Relevance Calculation Of Web Text Based On Lexical Cohesion
9	Lexical-semantic Similarity Calculation And Its Application In The Revision Of ISO 860
10	Research On Solving Maximum Flow Problem In Large-scale Network