Font Size: a A A

Relevance Calculation Of Web Text Based On Lexical Cohesion

Posted on:2008-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:D ChenFull Text:PDF
GTID:2178360245497663Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of information in internet, various Web-related services are growing, Web information in many respects, have been widely used. In this paper, we explore the methods of calculation of web texts. We mainly focused on the following two issues.Firstly, that is the obtaining of the web text. This is mean to extract the main text of a web page, which is a branch of information extraction. Web page is a kind of semi-structure text. It contains much information, including the text of Chinese characters which discuss the same topic. It also has advertisements which aim at pursuing profits, the web links and anchor texts which used to link to another web page, and source codes for web browser to read. What we have to focus is how to extract the text of Chinese characters which discuss the same topic, filtering the un-relevant part.Secondly, that is relevance computation of two texts. The relevance of two texts refers to the degree of connect of two texts. People usually use the vector space model to calculate the degree of the relevance of two texts.This paper deal with the extraction of web text using maximum entropy model for the first time, and gives the web text extraction algorithm based on maximum entropy model. As to the computation of the relevance of two texts, we research the relation between the word cohesion and topic of a text, analysis the factors of lexical cohesion which have an influence up on text topic. Combining these factors, at last we give the method of computation of relevance of two texts based on lexical cohesion, including how to describe a text using lexical chains (LCDR), the calculation of the weight of lexical chain (WCLC), and the text matching algorithm (LCDM). According to the experiment, we prove that this method could improve the recall and precision of calculation of relevance of two web text.
Keywords/Search Tags:relevance, lexcical cohesion, lexical chain, maximum entropy model, extraction
PDF Full Text Request
Related items