Font Size: a A A

Research On Cultural Calculation Of Semi-structured Data

Posted on:2016-05-13Degree:DoctorType:Dissertation
Country:ChinaCandidate:R G TeFull Text:PDF
GTID:1368330473461737Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
There are many ways to express culture, such as music, dance, patterns, text and language, cultural computing need to express various forms of culture through the digital way. In this paper, through the semi structure storage mode study of music and the pattern, and related methods of data mining, and then some problems of calculating method and other cultural discussion.In this paper, through the deepening of the music in the frequent pattern mining research, changing on the music document mining content of music, think that mining frequent patterns does not necessarily able to achieve on the music music attribute judgment. And the researchers of frequent music model scale is also no uniform standards or recognized, and the diversity of music document structure and discomfort also in the application of continuous limit data mining technology. Therefore, this paper proposes a S-MusicXML score model based on semi-structured XML technology. S-MusicXML can be many researchers concerned most music gene directly and clearly describe them, without the need to extract from the music, and is convenient for application of data mining technology.Research for culture of pattern, then this paper, used in fan network model to describe the XML data model and pattern China minority patterns of the fabric are introduced, which show that the semi structured data model to store music and cultural patterns, and lay the foundation for its data analysis.On the basis of the above model, this paper study found some to structure mapping cultural property there are music and patterns, and by experts in the field of communication, affirmation contains these substructures content is the cultural characteristics. It can be found, from semi structured music and pattern model can can extract culture gene, indirect proof of the existence of cultural gene. Pattern similarity measure algorithm similarity measurement method which is mainly for the measure of similarity analysis, music gene pattern (PSMA) and the pattern of the fractal analysis.By summing up the music and the patterns of the two different culture gene data mining method, this paper presents the ESM metric method (Events similarity measure) event similarity and hide subtree two concepts.ESM proposed the purpose is to solve the uncertainty of heterogeneous data limitations in similarity measurement. ESM through the description of the function to express arbitrary event, the similarity between events compared into the comparison of the two function form, the general method can therefore achieve event similarity metric. And through a variety of different types of examples by function expression event method is feasible.Hide subtree proposed the purpose is to through mining association structure of semi structured document neutron inter tree, and then dig the document theme, mining is used to realize culture. Due to the hidden tree is a concept for all semi structured data and put forward, which can be applied to the analysis of relationship between different reference document. In this paper, after that, also by using the algorithm to the website, the music, the artificial data and other different types of semi-structured documents are analyzed and discussed the reliability of mining, hide subtree as the document theme. The experimental results show that, the theme of mining algorithm is superior to the existing similar algorithms maintain a degree in the subject, and the application areas are more extensive, the scheme is effective and feasible.The semi structured document occupy larger space, this paper finally introduces the compressed storage scheme of semi structural document and storage methods on the part of the culture of mining results, are dynamic ordered tree storage model and the ordering sequence compression scheme of CSN (Compress sequence number)This paper presents a dynamic ordered tree storage model, is because the representative XML as semi structured data model, the document is larger, with the XML document binary code compression can effectively reduce the storage space, but the traditional encoding is not conducive to the orderly tree operations. Dynamic ordered tree storage model proposed is a can not only for efficient storage space of ordered trees, dynamic operation and can realize orderly tree encapsulation structure. Methods the binary code segment of ordered tree segmentation processing through the structure, reduce the modification amount, and through the method of three relocation to quickly select to modify the package. For the missing node significance appears ordered tree dynamic after the problem, put forward to the tree node number aided description table, through the node table can record the contents and the significance of each node, and then added the binary code can only express the tree structure defect.Due to the similarity calculation result is a sorted sequence, therefore this article proposed for ranking sequence compression method of CSN. Then through the research of the decompression algorithm, theoretically proved the uniqueness and the correctness of the decompression algorithm results. Finally, through the comparison with other compression schemes that experiment, CSN compression algorithm has high compression rate of integer type document.
Keywords/Search Tags:form of culture, culture mining, frequent pattern, S-MusicXML, event similarity, XML, dynamic ordered tree
PDF Full Text Request
Related items