Font Size: a A A

Research On Keyword Extraction Technology Oriented To Conversational Text

Posted on:2021-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y S DingFull Text:PDF
GTID:2428330629450754Subject:Public Security Technology
Abstract/Summary:PDF Full Text Request
There are a lot of conversational text messages in instant messaging tools,including bad information such as rumors,personal attacks,swindles,scams,and reactionary speech.Even many criminals resort to instant messaging tools to commit crimes.Therefore,the analysis of conversational texts plays an important role in public security opinion analysis,case investigation,and electronic evidence analysis.Keywords are the best way to quickly grasp the main content of the text,so the study of keyword extraction of conversational text is very valuable.The dissertation focuses on the keyword extraction task of conversational text.The specific contents are as follows:1.Aiming at the problems of sparse conversational text vectors,poor centrality,and topic cross-talk,SCM(Segmentation-Clustering Model)conversational text initial cluster construction model is proposed.SCM takes the idea of segmentation first and then clustering to construct an initial cluster of conversational text.Firstly,the conversation segmentation algorithm is constructed by the temporal features and implicit features of the conversational text,and the conversational text stream is divided into fine-grained conversation fragments;Then,the DBSCAN algorithm is used to cluster the conversation fragments to construct an initial cluster of conversational text.2.In view of the poor adaptability of traditional algorithms in conversational text,NBLT(Naive Bayes-LDA-TFIDF)crime text keyword extraction and crime classification recognition model is proposed.NBLT uses the result of SCM to complete keyword extraction task,combining supervised with unsupervised keyword extraction algorithms.,the extraction results of the Naive Bayes algorithm with the extraction results of the unsupervised keyword extraction algorithm of multiple algorithm fusion.NBLT efficiently solves the keyword extraction with the difficulties of conversational text.And based on the extracted keyword sets,with the conversational text of human killing cases,corruption cases and drug-related cases to train and obtain a Bayesian classifier,which realizes the keyword extraction and classification of crime conversational text.3.Based on the above two models,a crime conversational text analysis system was designed and developed.The system pretreats the testing-text,uses the SCM algorithm to split and cluster the conversational text streams,and uses the NBLT algorithm for keywordextraction and crime type recognition.At the same time,the system also adds the word segmentation annotation function based on jiaba,entity recognition function,word frequency statistics,word cloud display function,and the text sentiment analysis function based on Baidu AI sentiment analysis tool,which is convenient for the analysis and application of conversational text flow in Public Security Business.The following experiments were conducted using the public QQ group chat data set and crime movie subtitles as the corpus: Conversational text feature extraction and feature-based segmentation experiment;Conversation clustering experiment;Crime conversational text keyword extraction experiment;Crime conversational text classification experiment.The results of experiments show that the methods proposed and adopted have good effects in terms of accuracy,recall,and F value,which verifies that the model and the system developed in this paper have certain advantages.
Keywords/Search Tags:Conversation segmentation, Text clustering, Keyword extraction, Text classification, Text analysis system
PDF Full Text Request
Related items