Font Size: a A A

The Development And Realization Of Scenario Thesaurus Based On Machine Learning

Posted on:2014-09-20Degree:MasterType:Thesis
Country:ChinaCandidate:X DiFull Text:PDF
GTID:2268330425975780Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the development of informatization, the massive popularity of Internet andintelligent terminals, more and more information can be obtained from client. These vastamounts of information have enormous practical value. For example, by obtaining the user’sdialogue information, the conversation scenario can be identified in order to determine theuser’s behavior and intentions, thus providing the required services. Accurate dialogue scenerecognition plays an important role in optimizing human-computer interaction and promotingthe development of smart products.Based on the existing theoretical studies, this paper used Naive Bayes algorithm andfeature weights to achieve a self-learning scenarios thesaurus, and deployed it into a self-builtdispersed parallel computing environment. Naive Bayes algorithm can calculate theprobability of a dialogue belonging to a dialogue scene, however, to reduce the complicatedrelationship computing and to improve the performance of the algorithm, the traditionalBayesian classification algorithm uses the independence assumption that attributes mutualconditional independence. Independence assumption ignores the attributes that in the realenvironment existing a certain correlation between the characteristics of the algorithm, whichwill bring some influence to the classification accuracy of the algorithm. Therefore, animproved TF-UIDF weighting algorithm was proposed in this paper. Compared to thetraditional weighting algorithm, TF-UIDF algorithm added different distribution of theattribute in different scenarios classes, and has strong adaptability towards the skew of thetraining texts. Evaluate the importance of attributes in the classification assessment byTF-UIDF algorithm can help Naive Bayes classifier filter out the less important attributes andstrengthen the effect of important attributes in the classification calculation, without affectingthe classification performance. Furthermore, to ensure the sustainable use of the scenariothesaurus, the self-learning modules was added in the lexicon, using the pre-substitution incombination with evaluation systems to ensure the effectiveness of each learning process ofthe thesaurus scenario. Through continuous learning and optimization process, lexiconclassification results are always maintained optimal. As for the distributed environment, themainstream distributed framework was analysed in this paper to achieve a lightweight DaSysdistributed parallel framework. This DaSys framework adopted a load balancing algorithmswhich is based on the calculation type and redundant master service machines, and endowedthe scene thesaurus with high performance and fault tolerance.Results showed that TF-UIDF algorithm effectively made up the deficiency of NaiveBayes algorithm, and its ability to adapt to the training set and the classification accuracy areboth higher than traditional algorithms; machine learning module in the actual learningprocess also showed well optimization capabilities to the training set. Besides the basicscenario classification and learning capabilities, scenario thesaurus also achieved the highperformance requirements to meet the high concurrent requests.
Keywords/Search Tags:Scenario Recognition, Naive Bayes, TF-UIDF, Machine Learning, Distributed
PDF Full Text Request
Related items