Font Size: a A A

Research On NLP Technologies And Application In Chinese Information Retrieval

Posted on:2006-04-14Degree:DoctorType:Dissertation
Country:ChinaCandidate:X W LiuFull Text:PDF
GTID:1118360212489255Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Chinese Information Retrieval (CIR) is an important branch of Information Retrieval, and has achieved rapid development in recent years. However, there are still some issues need to be studied further for improving the effectiveness and efficiency of today's CIR system. This paper uses the NLP technologies based on statistics and algebra, studies the processing methods for documents at the word level and document level, and presents solutions for several key problems in CIR.This paper introduces a text clustering method based on VSM (vector space mode1).We propose a new clustering model that includes a step of feature-vector adjusting and give a feature appraising function which can be used to extract text feature and improve the clustering algorithm.The clustering model proposed in this paper has been proved to be feasible.Automatic Summarization is a key technique for CIR. This paper provides an algorithm which automatically summarizes a document by extracting subtopics from the sentences. This algorithm is based on statistics and partially understanding knowledge, in order to get better summarization and get rid of the restriction of information domain. To this end, a new module of mutual dependence is put forward too and used to select feature, which can selects accuracy features for the summarizing algorithm. Then new rules to evaluate sentences are brought forward. Furthermore, a new task-based algorithm to evaluate summarization is offered.In most retrieval systems, the demand of users is represented by query keywords. In fact, there exists difference between the real demand of users and the query words. How to decreasing the difference is the key problem in implementing the user-oriented information system. This paper puts forward a way of query expansion oriented to user and adapted modification model oriented to users' interests. The model can increase the retrieval precision and make the returning webpages satisfy users better. Furthermore, using this model can ensure the numbers of query-words expanded and can optimize the regulating factors.
Keywords/Search Tags:Chinese Information Retrieval, NLP Technology, Text Clustering, Automatic Summarization, Query Expansion
PDF Full Text Request
Related items