The Development And Realization Of Scenario Thesaurus Based On Machine Learning

Posted on:2014-09-20

Degree:Master

Type:Thesis

Country:China

Candidate:X Di

Full Text:PDF

GTID:2268330425975780

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of informatization, the massive popularity of Internet andintelligent terminals, more and more information can be obtained from client. These vastamounts of information have enormous practical value. For example, by obtaining the user’sdialogue information, the conversation scenario can be identified in order to determine theuser’s behavior and intentions, thus providing the required services. Accurate dialogue scenerecognition plays an important role in optimizing human-computer interaction and promotingthe development of smart products.Based on the existing theoretical studies, this paper used Naive Bayes algorithm andfeature weights to achieve a self-learning scenarios thesaurus, and deployed it into a self-builtdispersed parallel computing environment. Naive Bayes algorithm can calculate theprobability of a dialogue belonging to a dialogue scene, however, to reduce the complicatedrelationship computing and to improve the performance of the algorithm, the traditionalBayesian classification algorithm uses the independence assumption that attributes mutualconditional independence. Independence assumption ignores the attributes that in the realenvironment existing a certain correlation between the characteristics of the algorithm, whichwill bring some influence to the classification accuracy of the algorithm. Therefore, animproved TF-UIDF weighting algorithm was proposed in this paper. Compared to thetraditional weighting algorithm, TF-UIDF algorithm added different distribution of theattribute in different scenarios classes, and has strong adaptability towards the skew of thetraining texts. Evaluate the importance of attributes in the classification assessment byTF-UIDF algorithm can help Naive Bayes classifier filter out the less important attributes andstrengthen the effect of important attributes in the classification calculation, without affectingthe classification performance. Furthermore, to ensure the sustainable use of the scenariothesaurus, the self-learning modules was added in the lexicon, using the pre-substitution incombination with evaluation systems to ensure the effectiveness of each learning process ofthe thesaurus scenario. Through continuous learning and optimization process, lexiconclassification results are always maintained optimal. As for the distributed environment, themainstream distributed framework was analysed in this paper to achieve a lightweight DaSysdistributed parallel framework. This DaSys framework adopted a load balancing algorithmswhich is based on the calculation type and redundant master service machines, and endowedthe scene thesaurus with high performance and fault tolerance.Results showed that TF-UIDF algorithm effectively made up the deficiency of NaiveBayes algorithm, and its ability to adapt to the training set and the classification accuracy areboth higher than traditional algorithms; machine learning module in the actual learningprocess also showed well optimization capabilities to the training set. Besides the basicscenario classification and learning capabilities, scenario thesaurus also achieved the highperformance requirements to meet the high concurrent requests.

Keywords/Search Tags:

Scenario Recognition, Naive Bayes, TF-UIDF, Machine Learning, Distributed

PDF Full Text Request

Related items

1	Research On Optimization Of Routing Algorithm Based On Semi-Naive Bayes
2	A Human Action Recognition Method Based On Computer Vision
3	The Research On SQL Injection Detection Technology Based On Naive Bayes And LSTM Recurrent Neural Network
4	Design And Implementation Of The Email Spam Detection System Based On Naive Bayes And Svm
5	Design And Implementation Of The Email Spam Detection System Based On Naive Bayes And SVM
6	Incremental Learning Of Naive Bayes Chinese Classification System
7	Research On Network Traffic Classification Based On Machine Learning
8	The Mobile Customers Occupational Recognition Naive Bayes Algorithm-based Integration And Debugging
9	Research On The Design Of Naive Bayes Classifier Based On Memristor
10	Development Of Face Verification Algorithm On Massive Data Scenario Based On Metrix Learning And Distributed Machine Learning