AutoLink Semi-supervised Multi-label Study Of Literature Research And Implementation Methods

Posted on:2015-02-16

Degree:Master

Type:Thesis

Country:China

Candidate:M C Zhang

Full Text:PDF

GTID:2268330428977018

Subject:Computer application technology

Abstract/Summary:

With the deepening of the interdisciplinary collaboration study, many documents are in-terdisciplinary and the number of literature in database is growing at the rate of millions every year. Now the automatical extract of deep faceted classification tree in a field has come true. But how to automatically link the millions of literature to the faceted classification tree and promote the literature to be rapidly and quickly retrieved has become a problem to be solved.Through analysis, the dissertion changes the problem of automatic link of literature into a multi-label classification problem to solve. However, we can neither mark a large amount of data to train the classifiers; nor can we ignore completely the small markingsâ€™guide to the classification. In the era of big data, we should consider both the accuracy of algorithm and the time of executing it. Therefore, this dissertion studies the semi-supervised multi-label leaning algorithm based on distributed framework. The detailed work is introduced as fol-lows:1) The dissertion analyzes the existing strategies of tabbed algorithm, and determines First-order strategy as the research plan according to their respective advantages and disad-vantages.2) Considering the shortcomings of the label propagation algorithm, the effect of labeled data and unlabeled data on the propagation of algorithm and the clustering hypothesis, the dissertion puts forward the multi-level label propagation algorithm based on the data recon-struction. Experiments show that the improvement of label propagation algorithm is effective. Under the condition of invariable time complexity, it improves the algorithm accuracy.3) The dissertion also analyzes the distributed file system in the Hadoop and MapReduce which are two core components. By using the laboratory equipment, the author successfully sets up the distributed computing environment of three nodes.4) The author improves the matrix multiplication method under the distributed frame-work and changes the multi-level label propagation algorithm based on the data reconstruc-tion into the semi-supervised label learning algorithm. In order to improve the classification accuracy, the author extracts the label-specific features before classification which means re-ducing the data dimension. Compared with the existing multiple label algorithm, the multiple label learning algorithm built by the author has more advantages. It has better performance and can deal with massive data. The bigger the data size is and the more the computer hosts are, the greater the accelerating algorithm is.5) By Lucene, we develop a prototype system of automatical link of literature which has the functions of keywords retrieval and faceted retrieval. In the beginning, it uses the DFT-Extractor system to obtain the required faceted classification tree. In establishing the faceted indexing, the semi-supervised label learning algorithm is embedded in. Later, through the test on ACM data; it further explains the method is effective. At the same time, it can low-er the hardware requirements, reduce costs and have greater practical value.

Keywords/Search Tags:

faceted retrieval, semi-supervised learning, multi-label learning, label-specificfeatures, Hadoop

Related items

1	Research On Several Key Issues Of Multi-label Learning For Limited Supervised Information
2	Research And Application Of Multi-label Learning Algorithm
3	Research On Semi-Supervised Multi-Label Feature Selection Algorithm
4	Research On Multi-Label Learning Algorithms With Distance Metric Learning
5	Research On The Utilization Techniques Of Partial Label Data
6	Research And Application Of Image Classification Algorithm Based On Semi-supervised Learning
7	Research On Multi-label Classification With Incomplete Label Information
8	Research On Semi-supervised Multi-label Propagation Algorithm Based On LTSA
9	Research On Multi-label Learning Algorithms With Ensemble Learning
10	Research On Semi-supervised Label Distribution Learning And Label Enhancement Algorith