Font Size: a A A

Technical Documents Classification And Finding Repetition

Posted on:2008-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:Q P LvFull Text:PDF
GTID:2178360212986193Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
The scientific and technological activity have different forms and content, and the goal of scientific and technological activity also have diversity. In order to evaluate the scientific, social, economic value of the scientific and technological documents, the paper puts forward the model and method to classify the scientific and technological documents, and find the repetition between them.The paper adapts the improved Maximal Match Algorithm method of text segmentation, it is a kind of keywords matching segmentation method. Then we can get the word frequency statistical information of the scientific and technological documents. The paper uses application field Hamming code model to get the document's Hamming code vectors, then it can compute the Similarity degree based on the distance of Hamming code, and then it can classify the science and technology documents to different application fields. And this paper uses neural network of three layers to classify all scientific and technological documents. Then it can support the professors to evaluate its value based on its type. In order to improve its speed of finding similar scientific and technological documents, the system stores the document's Hamming vectors in the form of tree, it is built through Hierarchical Clustering Algorithm. The efficiency and accuracy have been proved greatly.
Keywords/Search Tags:Maximal Match Algorithm, Hamming Distance, Text Similarity, Artificial Neural Networks, Error Back Propagation Algorithm
PDF Full Text Request
Related items