Font Size: a A A

Research On The Key Problems Of Ensemble Learning For Cross-lingual Text Mining

Posted on:2016-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:W L GuiFull Text:PDF
GTID:2308330467996726Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet and global information, the Internet information resource type and amount increasingly rich, the global information sharing between different countries is also increasing quickly. International exchanges in the fields of the academic, business or politics also become frequently, the languages used seem to be more and more diversified and imbalance. The language barrier limits the effective access of information to the people, affected the multilingual information value into full play. So that cross language learning has become a pressing demand. In order to learn cross language text more effectively, cross-lingual text mining also entered the research field of machine learning.In this article, for cross-lingual text data, we take each language as a view, and then use the relationship between the different views, hope other views information can help a view developing the performance in learning. In this thesis, we adopt the stratified sampling method to extract features from different views to form each feature subset, which can make each feature subset characterize all views well. Motivated by the ensemble learning, we obtain the learning results on each component data set with one feature subset and integrate them for better performance. Based on the thought of Ensemble Learning, we present two methods for cross-lingual text mining:Cross-lingual classification with Stratified Sampling-based Random Forest (SS-RF) Algorithm, and Cross-lingual Clustering with Stratified Sampling-based Cluster Ensemble (SSCE-CLC) Algorithm. A series of experiments are conducted on real-world cross-lingual text data sets, and the results have shown that our proposed methods are superior to the state-of-the-art multi-view text mining methods.
Keywords/Search Tags:Cross-lingual Text, Ensemble Learning, Stratified Sampling, RandomForest, Cluster Ensemble
PDF Full Text Request
Related items