Font Size: a A A

Based On The N-element Analysis With The Composite Standard Of The Text Of Word Frequency Statistics Cited Study

Posted on:2010-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:X F GaoFull Text:PDF
GTID:2208360302457589Subject:Information Science
Abstract/Summary:PDF Full Text Request
Due to the development of science and technology, the information has become an important resource in our modern information society, which also makes the information resource keeping a speed of explosive growth and unlimited expansion. To cope with this problem, information processing is the key factor to achieve the satisfying condition for information utilization. It is an important task to generate concise and accurate information indexing for information processing. To some extent, the quality of automatic indexing could determine the effect of information processing and the value of information utilization. Under this background, it's very important to improve and promote methods of automatic indexing for information indexing with low cost and high efficiency.Therefore, centering on technologies and methods of the automatic indexing as well as taking text information indexing as an object of study, this paper discussed the new combined method of automatic indexing for text information based on N-gram analysis and word frequency statistic by combining comparative analysis method with experimental analysis method. The main content as follows:First, taking text and automatic indexing as main breakthrough point, this paper provided a review and summary of automatic indexing from micro-segmentation of basic theory, representative methods and map of research route, then, it pointed out the problems in development of automatic indexing and the possible solution as well as the research topic of combined method of automatic indexing.Second, based on the comprehensive and systematic comparison and analysis between the method of N-gram automatic indexing and word frequency statistic automatic indexing from aspects of theory, approach and realization process, this paper pointed out that they shared an essential agreement and complemented each other's advantages of approach. Furthermore, the author presented a new combined method of automatic indexing for text information based on N-gram analysis and word frequency statistic, which combined N-gram analysis with word frequency statistics by introducing two tools of conditional probability in Statistics and entropy in information theory.In the end, to verify the validity in the theory and the feasibility and effectiveness in the application of this new method, a detailed realizing plan and process for the automatic indexing was produced by computer program. Furthermore, through the comparative experiment from the view of practice, the result showed that it had certain superiority in the performance of automatic indexing.So this paper's research work possesses certain innovation. And this method could provide certain reference and guiding significance for studying combined method of automatic indexing.
Keywords/Search Tags:Automatic indexing, N-gram analysis, Word frequency statistic, Text combined indexing
PDF Full Text Request
Related items