Font Size: a A A

Research On Cross-domain Text Classification Based On Multi-topic Spaces

Posted on:2018-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:Q Q YangFull Text:PDF
GTID:2348330542992577Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Traditional text classification methods reruire the feature space between one source and one target has the same probability distribution and one source has enough labels.However,it's in valid in many real applications.Therefore,the research of cross-domain classification methods which use labels of the source domain to train an accurate classifier for the target domain has received extensive attention.In cross-domain classification,basing on feature representation is one effective way to achieve knowledge transfer,and compared these ways based on rare feature spaces,basing on high-level concept spaces is more effective,such as topic spaces,morever,these ways use topic models to construct one single high-level feature space.However,probabilistic topic models such as PLSA and LDA is sensitive to the initial value,and the representation of topics is in-complete.Therefore,it is of great value to construct multi topic spaces and learn a more robust cross-domain classification model.In this paper,we focus on the research of multi topic spaces method in cross-domain classification,our main contributions are as follow:(1)In view of the weekness in topic models,a novel cross-domain text classification algorithm based on re-learning on multiple layers of topic spaces is proposed.It first extracts multi topic spaces,then use the non-negative matrix to re-learn multiple toipic spaces,then get a better topic space,and finally we build a mapping relationship using re-learnt topic spaces to achieve cross-domain text classification.Experimental results show the feasibility and effectiveness of the proposed method.(2)In consideration of the problem of in-complete semantic representation and the deviation based on single-bridge mapping,a new approach based on multi-bridge mapping for cross-domain text classification is proposed.It first extracts both multi-layer shared and domain-specific topics,and then build multi-mappings between domain-specific topics in different domains by using multi-layer shared topics as multi-bridges.Experimental results conducted on 20 newsgroups and Reuters-21578 datasets demonstrate the effectiveness of the proposed approach.
Keywords/Search Tags:text classification, cross-domain classification, topic models
PDF Full Text Request
Related items