Font Size: a A A

Research On Determining Method Of Relationships Between Contacts Based On SMS Text Vector

Posted on:2016-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2348330479954723Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Short Message Service(SMS) as a very important communication tool in people's daily life, by analyzing the content of messages aiming at figuring out the possible relationship between the contacts in real life, contributes to re-application of message data.Relationship between contacts can be determined by means of conventional text classification methods used in data mining proceeding, but the traditional text classification method is not applicable to the message content classification requirement to a certain extent, therefore it is of great significance to study the classification method based on text messages and then determine relationship that contacts belong to.To facilitate follow-up contacts relationships classification based on SMS text content,it is required for SMS text segmentation to extract the valuable words, by building its own user dictionary and using NLPIR segmentation system to complete the segmentation operation of SMS text. Taking into account that part of the words is meaningless for subsequent analysis, therefore the proper stop words list is built to remove stop words for the collection of segmentation result, achieving the final word collection of SMS text which is available for the final determination of the contacts relationship.The determination of contacts relationship based on the idea of text categorization needs to build appropriate SMS text feature vector for the text collection, and four categories including 14 different features items are selected based on the requirement of the characteristics of SMS text and the determination of contacts relationship to constitute the SMS text feature vector, and meanwhile based on the idea of contribution, the calculation method of each feature item weights among feature vector are given.After obtaining the feature vector of each SMS text, taking into account the very large amount of data, in order to improve the efficiency, the KNN algorithm is properly improved and an improved KNN algorithm based on k-means is proposed, which effectively reduces the required training sample collection as well as ensuring the accuracy of the final classification result. Experiments show that, in the case of determining classification category defined as family, friends, colleagues and strangers and other four categories, a relatively accurate determination can be obtained for the relationship between the specific two contacts in mass-SMS data by using the method.
Keywords/Search Tags:SMS text, contacts relationship, feature vectors, KNN
PDF Full Text Request
Related items