Research On Text Context Analysis With Frequent Patterns

Posted on:2016-09-27

Degree:Master

Type:Thesis

Country:China

Candidate:S N Song

Full Text:PDF

GTID:2298330467998861

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As the development of information technology, information-centric users promote thetext data mining unprecedented rapid development. Efficiently organize, classify, label, andextract relevant information has been paid much attention, correlation mining, associativeclassification, and frequent pattern-based clustering has been a focused theme in data miningresearch for several years. However, data size increasing and complex organization structuremake people no longer satisfied with how fast accurate positioning key in the huge datasource. Analyzing the implicit information behind key data and relationship between data setstructures is also of great significant. This paper focuses on text mining and text data analysis,pattern context analysis with frequent pattern mining is studied and discussed.Text context analysis with frequent pattern is frequent pattern mining result reprocessing,which is a kind of application for pattern mining research, such as pattern classification andcluster. Pattern context analysis establishes a connection between multi-size data unit on thebasis of statistical correlation, which provides support for data analysis, classification andinterpretation. Using context model for frequent patterns to automatically generate semanticannotation (SPA) is a new content in context modeling research, and SPA is an automatic,without guidance process. Compared with traditional pattern obtained in the process ofmining support, confidence and other static information, context focuses on the relationshipbetween pattern and data sets, on the basis of building the similarity relationship betweenpattern and transaction.This paper discussed the concrete process of context modeling, the characteristics andlimitations of related algorithms and model. However, single pattern context can only meetthe part of demand in text mining, for analyzing a composed of multiple patterns such astransaction is still lack of relevant definition and effective method. For text semantic analysis,and also has some limitations, which shows on large data noise distribution as anomalypattern semantic annotation, at the same time. Based on this, multi-patterns context analyzing(mPCA) method is providing. The method based on vector space model to calculate contextmodel, and gives four definitions of mode combination contexts: the average context, themaximum context, the entropy context and the partial context. Meanwhile their feature anddefinition are discussed with related reasoning and proof process. A more complete context analysis process is provided, and the concept of context describe sets and construction methodare introduced either, which provide a reference for different application scenarios. Finally,mPCA method and contextual analysis process are testing on a two text data sets.The experimental results show that the mPCA method on the text semantic analysisresult is superior to the SPA, TF-IDF and the classic method of LCS, which means thecombination of multiple modes and transaction context by mPCA can retain semantic in alarger extent. The control experiments between mPCA and TF-IDF show that mPCA methodinfluenced by word frequency limitation smaller, in which the transaction context vector canacquire a better result than document feature vector of TF-IDF. And, in the differentcomplexity and size of the data set on the experiment showed that mPCA can obtain goodexperiment results. In addition, the mPCA has reduce model retrieval space dimensions anddoes not affect or even enhance the semantic space vector carrying capacity at the same time.

Keywords/Search Tags:

Text Mining, Frequent Pattern, Context Modeling, Vector Space, Semantic Analysis

PDF Full Text Request

Related items

1	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
2	Research On Semantic Frequent Pattern Mining Algorithm Based On Trajectory Data
3	Semantic Similarity Calculation Text Field Vector Space Model
4	The Research And Relization Of Mining Frequent Patterns On Business Data Straems
5	The Research On The Related Problems Of Association Rule Mining
6	A Study On Algorithms Of Weighted Frequent Pattern Mining
7	Research And Implementation Of Bad Message Text Detection Method Based On Frequent Pattern Mining
8	The Analysis On The Basic Techniques For Preprocess Of Text Mining And The Study On The Application Of Text Mining
9	Constraint-Based Frequent Pattern Mining:Novel Applications And New Techniques
10	Research On Website Optimization Strategy Based On Frequent Pattern Mining