Font Size: a A A

Research And System Implement On Chinese Documents Retrieval Based On Content

Posted on:2007-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhangFull Text:PDF
GTID:2178360212455244Subject:Information Science
Abstract/Summary:PDF Full Text Request
We are studying the retrieval system implement on Chinese computer documents. The corpus of system development contains 167 articles in "Journal of Software" from April 2004 to October 2004, and 75 articles in other computer journals and 75 articles about computer on Web. The corpus of evaluating system contains the 999 articles in "Journal of Software" from April 2000 to February 2005. We order the sentences in each article with weight, and select the range of sentences with user's choice, and retrieval articles in the system; we tag the part of speech of every word when we preprocess the articles to reduce ambiguity, we can increase the number of sentences by user enlarging the threshold until the total number of sentences in the article; furthermore, we analysis the user's frequent question sentences and bring up the mode of user's questions and support people to retrieval information with natural language. At last, we compare the system and Chinese journal full-text databases and most of the results from the system are superior.Comparing with the traditional retrieval systems, our system has five outcomes. First, our system is implemented based on content. It extracts the most important sentences from the article as the retrieved paragraph, and it will more accurate to represent the topic of the article comparing with the common retrieval system. Second, our system makes it true to retrieve documents with tagging part of speech. We have the data to explain the relation between Chinese documents with tagging part of speech and the performance of Chinese retrieval systems from experiments. Third, user can retrieve the full-text by enlarge the threshold. Forth, our system could have the capability of data mining. You can retrieve the words without high weigh when they are beside the words with high weigh. Fifth, our system supports user to make a retrieval question sentence with natural language. It can filter the question sentence, make high precision and release users from affords.The first chapter of this paper is about recent development of computer retrieval, containing the history of computer information retrieval, the common methods and modes, the difficulties in full-text retrieval and the problems needed to be resolved. The second chapter of the paper is about tagging part of speech of words and retrieving documents based on abstract, containing the theory about tagging part of speech of English and Chinese words and some important corpus, the necessity and practicability about retrieval based on abstract, and the idea about retrieval with natural language. The third chapter is about the relate theories and methods about the system. The fourth chapter is about...
Keywords/Search Tags:information retrieval, important sentences, part of speech tagging, natural language, retrieval evaluation
PDF Full Text Request
Related items