Font Size: a A A

Research On Multilingual Text Clustering

Posted on:2014-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:J X WanFull Text:PDF
GTID:2248330395495920Subject:Information Science
Abstract/Summary:PDF Full Text Request
With the further development and popularization of the Internet, as well as further deepening of the trend of globalization, information resources in various languages on the Internet showing explosive growth. Information age, information means that opportunities and success. Therefore, people eager to find interested and valuable information quickly and accurately from such a massive amount of network resources. Multilingual text clustering techniques relying on the traditional clustering techniques, while adapting to the multilingual information environment, better able to meet the demand of the people for the valuable information.First systematically introduce the research status of multi-language text clustering technology at home and abroad. Then, introduce the various stages of the text clustering technology and key technologies involved, including feature representation of text, similarity calculation, feature dimension reduction, text clustering algorithm and clustering effect evaluation method. Then, talk about representation of multilingual text which is the kernel of multilingual text clustering techniques. Multi-language text representation based on two ideas, one is "Convert multilingual text to single-language text", the other one is "Base on semantic analysis and find the semantic association between the multi-language text". Describe the detail of Latent Semantic Indexing method, its mathematical basis and rationale.Finally, choose2736bilingual news text, use K-Means clustering method and do some experiments based on the two models. In experiment one, use translation engine translate the bilingual news text to single-language text. In experiment two, use LSI and find the semantic association between the multi-language text. Results of the experiment show that "Convert multilingual text to single-language" can highly improve the clustering result. However, because of some reasons, Latent Semantic Index doesn’t get a good result.
Keywords/Search Tags:Multilingual text clustering, Cross-language text clustering, Textclustering, Latent Semantic Index (LSI), Multilanguage textrepresentatlon
PDF Full Text Request
Related items