Font Size: a A A

Research On Text Context Analysis With Frequent Patterns

Posted on:2016-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:S N SongFull Text:PDF
GTID:2298330467998861Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As the development of information technology, information-centric users promote thetext data mining unprecedented rapid development. Efficiently organize, classify, label, andextract relevant information has been paid much attention, correlation mining, associativeclassification, and frequent pattern-based clustering has been a focused theme in data miningresearch for several years. However, data size increasing and complex organization structuremake people no longer satisfied with how fast accurate positioning key in the huge datasource. Analyzing the implicit information behind key data and relationship between data setstructures is also of great significant. This paper focuses on text mining and text data analysis,pattern context analysis with frequent pattern mining is studied and discussed.Text context analysis with frequent pattern is frequent pattern mining result reprocessing,which is a kind of application for pattern mining research, such as pattern classification andcluster. Pattern context analysis establishes a connection between multi-size data unit on thebasis of statistical correlation, which provides support for data analysis, classification andinterpretation. Using context model for frequent patterns to automatically generate semanticannotation (SPA) is a new content in context modeling research, and SPA is an automatic,without guidance process. Compared with traditional pattern obtained in the process ofmining support, confidence and other static information, context focuses on the relationshipbetween pattern and data sets, on the basis of building the similarity relationship betweenpattern and transaction.This paper discussed the concrete process of context modeling, the characteristics andlimitations of related algorithms and model. However, single pattern context can only meetthe part of demand in text mining, for analyzing a composed of multiple patterns such astransaction is still lack of relevant definition and effective method. For text semantic analysis,and also has some limitations, which shows on large data noise distribution as anomalypattern semantic annotation, at the same time. Based on this, multi-patterns context analyzing(mPCA) method is providing. The method based on vector space model to calculate contextmodel, and gives four definitions of mode combination contexts: the average context, themaximum context, the entropy context and the partial context. Meanwhile their feature anddefinition are discussed with related reasoning and proof process. A more complete context analysis process is provided, and the concept of context describe sets and construction methodare introduced either, which provide a reference for different application scenarios. Finally,mPCA method and contextual analysis process are testing on a two text data sets.The experimental results show that the mPCA method on the text semantic analysisresult is superior to the SPA, TF-IDF and the classic method of LCS, which means thecombination of multiple modes and transaction context by mPCA can retain semantic in alarger extent. The control experiments between mPCA and TF-IDF show that mPCA methodinfluenced by word frequency limitation smaller, in which the transaction context vector canacquire a better result than document feature vector of TF-IDF. And, in the differentcomplexity and size of the data set on the experiment showed that mPCA can obtain goodexperiment results. In addition, the mPCA has reduce model retrieval space dimensions anddoes not affect or even enhance the semantic space vector carrying capacity at the same time.
Keywords/Search Tags:Text Mining, Frequent Pattern, Context Modeling, Vector Space, Semantic Analysis
PDF Full Text Request
Related items