Font Size: a A A

Research On Text Classification Combining Title With Text And The Method Of Opinion Targets Extraction

Posted on:2018-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:X K YuFull Text:PDF
GTID:2348330515492880Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of society,Internet information presents the trend of explosive growth.By observing the texts submitted by Internet users,most websites,especially news and government websites,the content of document has structured features,which contains the title and the text.The text is usually a detailed description of the event,including rich semantic information,at the same time,the topics are diverse and the noise is huge.The title is usually a concise summary of the event,accurate expression and clear semantic,which is very meaningful to make full use of the title information.In this thesis,it makes full use of the title of the characteristics of the title.The topic model is proposed for text classification research based on the title and the text.Due to the particularity of the title,the simple statement and syntax,the rules and syntactic dependency relationship can effectively extract the opinion targets from the title.The main works of this thesis are as follows:(1)This thesis uses a document with the title and the text to propose a topic model based on the title and the text,the model can obtain topic distribution of the text and the topic distribution of the title,by using the regulating parameter to optimize the topic distribution of the entire document.The thesis makes full use of the advantage of the refined and the concise topic of the title,which can reduce semantic complexity and the topic diversity to influence text classification.In this way,it can obtain the optimal topic distribution of the whole document to improve the accuracy of text classification.(2)Since the title is simple and the topic is clear,it is considered to obtain the opinion targets from the title based on the syntactic dependency relationship.In this thesis,it obtains the potential opinion targets from the title based on the rules and part of speech.Because of the particularity of the title corpus,the potential opinion targets and the verb has a strong dependency,this thesis constructs the verb dictionary.By the position of the verbs appear in the syntactic parsing tree,traversing the entire syntactic parsing tree,which can find the real opinion targets from the potential opinion targets.(3)Because of the corpus of this thesis are from a certain city's government convenience services website,which solves the problems faced by local urban residents.So the texts appear a large number of local named entities,In order to solve these special terms' influence on word segmentation and syntactic dependency relationship,this thesis joins a large number of local peculiar the community names,paths,bus subway names,etc.Because word segmentation has better accuracy,it has achieved good results in the task of text classification and opinion targets.
Keywords/Search Tags:Topic model, Text classification, Opinion targets, Named entities
PDF Full Text Request
Related items