Font Size: a A A

Information Extraction Of Academic Activities Transaction In The Chinese Document

Posted on:2014-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:S M TangFull Text:PDF
GTID:2268330425972950Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The academic relationship network belongs to the social relationship network, which are hidden in a large amount of information of academic activities transaction. With the rapid development of the evaluation of the science and technology, analyzing the potential academic relationships and constructing the academic network become more and more important research topic. Therefore, how to extract the information from the academic activities transaction is an essential task to build academic relationship network. In general, the information of the academic activities transaction exists in the text, so this thesis proposed a method to extract the information of the academic activities transaction from Chinese documents based on Conditional Random Fields.Since the description of the information of the academic activities transaction has the characteristics of long-distance dependencies, we adopt Conditional Random Fields as sequence forecast to extract the information of the academic activities transaction. This paper presents preparing of train data and test data, and particularly studied in the feature template design principles and methods according to the requirements of crf++tool. Furthermore, the regular matching method is used for text preprocessing. And the information extraction of the academic activities transaction based on the best template has relatively satisfactory results.However, the collecting of train data is usually manual way in the training process of Conditional Random Fields. It is time consuming and can’t represent the entire application statement format. Therefore we use a semi-supervised learning algorithm based on the KNN to improve the process of collecting training data. The method is used to collect some new valid training data from the test data. Firstly, CRFs model based on less training data is firstly employed to label the validation data and test data sequence. Then the labeled validation data is used to train classification rules to extract the important unlabeled data which is as new training data, and it is added to the original training data, so as to improve the effectiveness of the training data. Experiments show that this not only improves the extraction efficiency of the training data and improve the accuracy of transaction sequence tagging of the academic activities in the Chinese documents.
Keywords/Search Tags:information of academic activities transaction, regularmatching, Conditional Random Fields, semi-supervised learningalgorithm, K-nearest neighbor
PDF Full Text Request
Related items