Font Size: a A A

Research And Implementation Of Typos Automatic Detection And Correction In Online Users' Comments

Posted on:2017-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:M Y XuFull Text:PDF
GTID:2348330503992909Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Under the circumstance of rapid development of e-commerce, online shopping platform has accumulated a large amount of users' comments. Mining implicit information in these comments is extremely significant for both businesses and consumers, and typos in the comments become the key factor in affecting the accuracy of data mining. Therefore, the automatic detection and correction of typos in comments is very important. Currently, problems of automatically detecting and correcting typos in online users' comments are not well solved, how to improve the detection and correction accuracy rate becomes an important issue.The main work of this paper is as follows:(1) For the typos misreporting problem caused by the weak correlation between words in detection process, this paper proposes a typos automatic detection algorithm based on word vector. By using word vector technique to replace suspected words with its synonyms, it obtains the correlation based on word vectors. Combining word vector correlation with context probability, it can screen the comments containing typos and locate the typos in comments more accurately. Finally, compared with existing method, the detection accuracy rate is increased by 5.03%.(2) For the problem of low accuracy rate of typos automatic correction, this paper proposes a typos automatic correction algorithm based on weight mechanism. This algorithm generates suspected merging words to lay the foundation for getting correct candidate words by merging suspected words in comments. When sorting the candidate words which are similar in pinyin or glyph, it introduces the similarity weight between candidate words and suspected words to optimize the sorting algorithm for candidate words. When determining the best candidate word, it considers the probability distribution of candidate words and regards the first candidate word whose discrimination degree is large as the best candidate word, which solves the problem that the correct word may be wrongly replaced. The final correction accuracy rate is increased by 24.20% compared with existing method.(3) For the data noise issue in online users' comments, on the basis of data preprocessing in traditional methods, this paper successfully filters 15.03% water army comments and improves the accuracy of the training data. In addition, by replacing the attribute words in same type in comments, it reduces the space dimension of the feature vector to reduce system storage cost.(4) This research is verified by experiments and meets the practical requirements. It has been applied to Lenovo Research Institute's project "User feedback analysis system". The system in this project is stable and reliable.
Keywords/Search Tags:Online users' comments, Typos, Automatic detection algorithm, Automatic correction algorithm
PDF Full Text Request
Related items