Font Size: a A A

The Exploring Of Master Copy Recommendation System Of Semi-automatic Collation In The Full-text Database Of Documents Of South China Sea

Posted on:2016-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y H WuFull Text:PDF
GTID:2308330461457662Subject:Library and Information Science
Abstract/Summary:PDF Full Text Request
Construction of full-text database of documents is the most basic work to provide knowledge services and information service. However, in the process of full-text inputting, the error rate is high and the workload of collation is immense. There are a large number of unidentified words that cannot be recognized through pictures of documents. Nevertheless, the same event of documents will be transferred repeatedly among many official departments through the form of copying, printing and so no. According to the above phenomenon, this article uses large amounts of text existing in the full-text database of documents of South China sea to conduct the segmentation of text and the calculation of similarity of text. Try to use method based on object of reference to realize the recommendation of master copy.Firstly, according to the existing automatic collation prototype system, this paper builds a prototype system based on recommendation of master copy. On this basis, this article research the literature of segmentation algorithm of text. Select an appropriate segmentation algorithm that is based on Hidden Markov Models, and conduct the experiment by using the specific stopped dictionary of South China sea. Secondly, in order to select an appropriate calculation method text similarity for this experiment, this article selects an appropriate calculation model by comparing the various systems of text similarity and build the South China Sea subject classification system. With the introduction of using inverted index methods to store vectorial text, this method improves the efficiency of similarity calculation. Finally, this experiment demonstrates master copy recommendation is feasible to solve some unidentified words in the documents of South China sea.
Keywords/Search Tags:Text Similarity, Master copy recommendation, Text Segmentation, Subject classification system
PDF Full Text Request
Related items