Font Size: a A A

Review Classification Guided By Domain Knowledge

Posted on:2022-07-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q WangFull Text:PDF
GTID:2518306332457984Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet in the 21 st century,the number of Web applications and APP products presents a blowout growth,and has become an indispensable part of People's Daily life,which has brought huge market economic benefits and broad space for industry development.In order to win a place in the fierce market competition,developers need to know the user preferences in time,and make targeted updates to enhance the vitality of their products.Software review data is an important source for developers to learn about users' needs and preferences.At present,comment analysis usually follows the general process of "data categorization--information extraction",in which comment categorization is the premise and basis for the effective use of data.The classification can remove the impurity redundancy in the review data,and classify the information in the review data,such as users' new requirements,current product defects,etc.,so as to lay a foundation for developers to better understand the review information.At present,the research on comment classification methods in software engineering usually focuses on the characteristics of the data itself,and focuses on the differences among the comment data.Classifiers with different vectorization methods and different performance are selected to complete the task of comment data classification.Although these classifications work well for most review categories,their performance deteriorates significantly when applied to specific domains(sports,games,etc.).The reason is that these domains often contain domain-specific vocabulary and expressions.Therefore,the review classification method without domain knowledge is difficult to achieve ideal results.To solve the above problems,this paper proposes a domain knowledge-guided comment classification method.Based on the existing domain knowledge,the domain-related word dictionary of the domain(or a particular product)is automatically expanded,and then the comment classifier is trained,so as to better complete the task of comment classification in the domain.First of all,from the perspective of functional requirements and non-functional requirements,valuable comments are defined into five categories,and according to the quality attributes defined by different categories,domain-related words are manually extracted from the comments as the seed words of the dictionary.Secondly,based on the method of semantic analysis,the domain dictionary is constructed and expanded by calculating the text similarity of the corpus and obtaining the words with high similarity to the seed words,and taking them as a kind of text feature in the comment classification task.Finally,sentiment values are calculated and domain knowledge is introduced into the text feature representation process,so as to train the comment classifier and complete the task of comment classification for different fields.In view of the current text classification problem affected by domain,this paper conducted a series of experiments according to the proposed method to explore how to better acquire domain knowledge,and on this basis,further analyzed whether it is helpful to introduce domain knowledge for the task of App review classification.
Keywords/Search Tags:Text classification, domain dictionary, seed word
PDF Full Text Request
Related items