Font Size: a A A

Network Learning Platform In The Sub-word Sentence Similarity Algorithm

Posted on:2013-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:N HuangFull Text:PDF
GTID:2218330374461929Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the social information process, information is increasingly in demand and dependence. It has become the focus of the research to approach the access to information from the mass of useful information fast and efficiently. The language is the main carrier of information thus the requirement for language processing is increasing in the modern society. With the application of computers and the Internet popularity, Chinese information processing technology is facing new and greater challenges and opportunities. Chinese Information Retrieval can help users find useful knowledge on their own.As a basis for natural language understanding, the sentence similarity computing is a hot and difficult problem in information processing technology research. The status of the research has a direct impact on the field of intelligent question answering systems, machine translation, and information retrieval. Because of the complexity and evolution of Chinese language, the similarity calculation is still in the application of the initial stage. Since information processing in the national construction is of great importance, it is necessary to research on the similarity computation in-depth. Chinese sentence processing concludes word segmentation, the contents of the sentence similarity algorithm research. The identification of sentence similarity is based on the correct word, so the word quality determines the accuracy of the results of similarity calculation. The paper mainly discussed sentence segmentation technology and the sentence similarity, and it is verified by experiments the effectiveness of the proposed segmentation algorithm and sentence similarity calculation method. Finally, online learning intelligent question answering system is used as the media with sentence similarity calculation in the "question-answer" library.The main work of the paper is as follows:(1)The segmentation algorithm based on the word length is proposed in this paper, which can solve the word length limit of the maximum segmentation algorithm.(2)Semantic similarity algorithm for multi-factor is proposed in this paper. The method takes the similarity of the word form (key words), semantic similarity, sentence similarity of the three aspects into account to calculate the similarity of sentence meaning.(3)A detailed study is made for similarity calculation method based on keywords, sentence and semantic. For the calculation of similarity of word type (key words), vector space model-based approach only consider sided character of the Frequencies of Keywords and other surface information to calculate the similarity. This article will be fully effective in the information such as the order of key words, sentence length to use, consideration of the Frequencies of key words, word spacing, word order and sentence length and other information, so that the word shape similarity calculation results are more accurate. Semantic similarity, sentence affect the right weight for different keywords, the introduction of the weight of each word in sentence is used to calculate the sentence semantic similarity.
Keywords/Search Tags:distance learning platform, segmentation algorithm, semantic similarity, maximum matching
PDF Full Text Request
Related items