Font Size: a A A

Research On Subject And Predicate Relation Identification And Theme Relevance Computation Technology

Posted on:2010-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:X YangFull Text:PDF
GTID:2178360308478797Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recent years, with social development and scientific technological improvement, information grow rapidly. The explosion of information brings about the rapid growth of electronic documents. So document retrieval, classification and management are becoming more and more difficult. The traditional text processing technology uses the similarity of two documents to replace the theme relevance of two documents. But similar may not be relevant and relevant may not be necessarily similar. The theme relevance of two documents discussed in this paper, is the key technology of automatic identification of document relationship.The theme relevance calculation means that calc[ulating the relevance degree of the content of two documents by some means. A lot of studies have shown that a huge scale Chinese domain knowledge including a large number of entities and its domain background is important for many technologies including the relevance calculation. The analysis of the relationship between words is an important method to obtain domain knowledge.The technology to obtain subject predicate relationship discussed in this paper is helpful for obtaining domain knowledge. The subject predicate relationship means that when a noun is the subject of a sentence another verb is the predicate of this sentence at the same time. This paper introduces heuristic rules and syntax information to analyze subject predicate relationship based on the traditional statistical collocation analysis method.The theme relevance computation technologies are the main content discussed in this paper. The paper firstly applies vector space model to the theme relevance computation and use cosine similarity to compute relevance. For solving the problem that important features often drowned in many features with week distinguishing ability in vector space model, this paper uses the algorithm based on tf_idf threshold to extract the keywords in the document. This paper introduces the synonymous word forest to compute semantic similarity and improved semantic similarity to solute the potential matching relationship existed between different features. Last this paper introduces domain knowledge base to improve the theme relevance computation technology through computing the field distribution character between two documents. Then this paper applies theme relevance technology into the advertising recommender system to do improving and comparative experiments. The result of experiments shows that the relevance calculation method based on the domain knowledge base can achieve remarkable.
Keywords/Search Tags:relevance computation, subject predicate relationship, collocation analysis, domain knowledge base, advertising recommender system
PDF Full Text Request
Related items