Font Size: a A A

Design And Implementation For Classification Of Large-scale Short Text Data

Posted on:2018-03-15Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2348330518496296Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet technology and its'applications, short text data has become a new network application mode.And it reflects fast growth and large scale as the new characteristics of the present Chinese network. Text data is no longer just a resource stored in a computer for people to operate and analyze, but it becomes a tool to make information exchange more convenient and efficient.Compared with traditional text classification, short text data have many different because of length and sentence structure. With characteristic like short text data word,low information and weak ability of the concept description, the commonly used classification methods can not adapt to the rapid development of text scale. Compared with the traditional text classification algorithms, there are many problems such as size of large data and complicated data environment. It is of great significance and practical value in the research of data feature selection and data mining.This thesis mainly discusses the data classification technique of large-scale short text from two aspects of theory and practice. First of all,propose a text classification based on Sim-bu-tree algorithm and GerayC parameters, and analyze its advantages. Then, detailed introduces the realization of the principle of ant colony algorithm in the Hadoop environment. Aiming at large-scale data applications,a method of text categorization based on parallel ant colony algorithm is proposed. Then the simulation experiment verify the validity and feasibility of the algorithmTo sum up, this paper presents a method of large-scale short text classification with high accuracy and fast classification speed. It provides a new way for large scale short text data classification.
Keywords/Search Tags:Short text classification, Self-correlation analysis, Parallelization, Ant colony algorithm
PDF Full Text Request
Related items