Research On Subject And Predicate Relation Identification And Theme Relevance Computation Technology

Posted on:2010-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:X Yang

Full Text:PDF

GTID:2178360308478797

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Recent years, with social development and scientific technological improvement, information grow rapidly. The explosion of information brings about the rapid growth of electronic documents. So document retrieval, classification and management are becoming more and more difficult. The traditional text processing technology uses the similarity of two documents to replace the theme relevance of two documents. But similar may not be relevant and relevant may not be necessarily similar. The theme relevance of two documents discussed in this paper, is the key technology of automatic identification of document relationship.The theme relevance calculation means that calc[ulating the relevance degree of the content of two documents by some means. A lot of studies have shown that a huge scale Chinese domain knowledge including a large number of entities and its domain background is important for many technologies including the relevance calculation. The analysis of the relationship between words is an important method to obtain domain knowledge.The technology to obtain subject predicate relationship discussed in this paper is helpful for obtaining domain knowledge. The subject predicate relationship means that when a noun is the subject of a sentence another verb is the predicate of this sentence at the same time. This paper introduces heuristic rules and syntax information to analyze subject predicate relationship based on the traditional statistical collocation analysis method.The theme relevance computation technologies are the main content discussed in this paper. The paper firstly applies vector space model to the theme relevance computation and use cosine similarity to compute relevance. For solving the problem that important features often drowned in many features with week distinguishing ability in vector space model, this paper uses the algorithm based on tf_idf threshold to extract the keywords in the document. This paper introduces the synonymous word forest to compute semantic similarity and improved semantic similarity to solute the potential matching relationship existed between different features. Last this paper introduces domain knowledge base to improve the theme relevance computation technology through computing the field distribution character between two documents. Then this paper applies theme relevance technology into the advertising recommender system to do improving and comparative experiments. The result of experiments shows that the relevance calculation method based on the domain knowledge base can achieve remarkable.

Keywords/Search Tags:

relevance computation, subject predicate relationship, collocation analysis, domain knowledge base, advertising recommender system

PDF Full Text Request

Related items

1	Design And Realization Of Domain Specific Knowledge Base Extraction Syste
2	Some Key Problems On Isogeometry Collocation Method In CAE
3	Research On The Problem Of The Construction Of Knowledge Base Chinese Question-answering-system Automatically
4	Design And Implementation Of Subject Knowledge Base Supporting Semantic Reasoning
5	Design And Implementation Of Audit Subject Knowledge Base
6	Research On Multimedia Advertising
7	Research And Implementation Of Predicate Mapping Technology For Knowledge Base Question Answering
8	Analysis Technique Based On Knowledge Of Subject-oriented Query
9	Construction And Application Of Binary Collocation Semantic Knowledge Base Based On Multiple Knowledge Sources
10	Research On Predicate Mapping Method In Knowledge Base Natural Language Question Answering