Font Size: a A A

Research And Implementation Of Person Relationship Map Based On Co-Occurrence And Association Mining

Posted on:2020-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2428330572493941Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Modern people's life is getting faster and faster,it is difficult to extract a large part of the time to read articles,the paper provides some methods,can quickly understand the relationship between characters and characters in an article,according to the data obtained,use data Explain the main character of the character in the whole article,and the character relationship diagram can display the person's interpersonal relationship circle,help the reader to clear and grasp the close relationship between the characters before reading the full text,greatly saving the reading time.The paper selects "White Deer" as the research object,and uses the method of co-occurrence analysis and association rule mining to focus on the research object.This thesis uses Python language to write the program,extracts the name node in the text by co-occurrence analysis,and gives the weight size;at the same time extracts the weight of the edge between the two nodes in the corpus.The co-word matrix is constructed according to the extracted nodes and keyword pairs.In order to obtain the similarity matrix,the similarity is determined by using the coincidence factor Ochiai,so that the closer the distance between the two keywords is,the larger the obtained value is,and the similarity is obtained.The better.The Euclidean distance is the most intuitive measure of the linear distance between two points in a two-dimensional space.The SPSS clustering analysis software is used to find the Euclidean distance of the co-word matrix.The larger the distance,the larger the difference,and the smaller the distance,the higher the similarity.In order to better analyze the clustering of the co-word matrix,R-type clustering and Q-type clustering are performed on the co-word matrix.R-type clustering can not only understand the intimacy between variables,but also understand the affinity between variable combinations.In far and near relationship,Q-type clustering clusters cases according to variable information,and the generated pedigree map better illustrates the results of cluster analysis.When drawing the character relationship map,the text document format of the extracted point node and the edge information is respectively converted into a.CSV format,and then respectively imported into Gephi software,and the figure of the character relationship is drawn according to the pre-designed requirements.It is more intuitive to analyze the intimacy between characters from the drawn person diagram.Weka is used as an auxiliary tool in mining association rules.The commonly used Apriori algorithm is used in association rules.The setting of data set in Apriori algorithm is an important link.The whole text should be used as a database to separate each chapter of the article.The keywords appearing in the chapter are used as a record.The list of keywords in all chapters is combined to form a data set.The database is scanned multiple times,and frequent itemsets are found from the constructed data set,and association rules between characters are found.
Keywords/Search Tags:Co-occurrence Analysis, Cluster Analysis, Person Relationship Diagram, Frequent Itemsets, Association Rules
PDF Full Text Request
Related items